Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Timeline
Generic

John Cherian

ELLICOTT CITY,Maryland

Summary

With over 16 years of experience providing solutions for Enterprise applications, Data Engineering, and Machine Learning for Fortune 100 companies, I have become an expert in AWS data architecture, Data Engineering, and Machine Learning. My skills in Spark, Python, AWS Data analytics, Machine Learning, and AWS have helped me consult clients in Technical architecture, Technology stack selection, Development patterns and standards, Minimal viable product methodology, and building and coaching in-house teams. I am proud to be the main contributor for Spark on AWS Lambda open-source, Data on EKS, AWS ECS streaming and AWS Sagemaker SDK on AWS GitHub.

Overview

17
17
years of professional experience
1
1
Certification

Work History

Sr. AWS Data Lab Architect

Amazon Web Services
01.2021 - Current
  • Led and built a successful open-source GitHub project with 12+ contributors to bring low-cost Spark integration to AWS customers, resulting in the project being integrated to AWS Lambda product team. Led development effort, collaborated with other contributors, and ensured high standards of quality and performance. Experienced in AWS technologies including Lambda, S3, and DynamoDB, with a passion for finding innovative solutions to complex problems.
  • Successfully led a team of 5+ members for an AWS internal project, Apache Nifi on AWS EKS, that enabled cross-cloud access between AWS and Azure through a secure and dedicated private connection.
  • Collaborated with CTOs and CDOs of AWS customers to develop data strategies and production paths, resulting in successful implementation and utilization of Data Lake, Data Mesh, and Machine Learning solutions.
  • Built low-latency and batch pipelines utilizing AWS Glue, AWS EMR, AWS Lambda, and Pyspark to meet diverse organizational needs.
  • Developed and implemented data strategy and architecture for AWS Data Mesh and Central/Federated governance patterns to unlock the full potential of data while ensuring quality, consistency, and compliance.
  • Designed and implemented secure and scalable machine learning solutions for AWS clients, working closely with customers to translate business needs into technical requirements on the AWS platform.
  • Expertise in end-to-end MLOps pipeline design and implementation, deploying machine learning models into production environments in a rapid and continuous manner, while optimizing performance and cost. Implemented security and governance policies to ensure data privacy, compliance, and risk management for high-quality machine learning solutions.
  • Actively contributed to AWS Github open-source projects and big data blogs,
  • Spark on AWS Lambda-https://github.com/aws-samples/spark-on-aws-lambda
  • Contributor for Data on EKS on AWS labs https://awslabs.github.io/data-on-eks/docs/streaming-platforms/nifi
  • AWS ECS streaming -https://github.com/aws-samples/aws-ecs-data-streaming
  • Authored a blog on Low-cost Spark on AWS Lambda, which was later adopted by the AWS Lambda product team, demonstrating a deep understanding of data and big data technologies on the AWS platform.

Sr. Manager

Accenture LLP
11.2014 - 01.2021
  • Led AWS data strategy project, addressing critical areas such as data management and operation governance, data lake strategy and use cases, data security, audit policies, and governance, and data migration from on-premise to AWS data lake.
  • Collaborated with clients to select most suitable storage and compute engine for AWS data lake based on their use cases and logical/physical separation of phases of data based on zones.
  • Developed strategies for data lake metadata management, data classification based on latency requirements, data quality requirements assessment, and data flow notification strategy. Provided clients with a robust and effective AWS data strategy that met their specific needs and requirements.
  • Successfully led several delivery projects on AWS/Azure.
  • Assessed the current architecture of on-premise systems and developed a high-level roadmap to migrate to the Snowflake database, ensuring seamless migration and efficient data management for the client.
  • Enforced Python development standards, development design patterns, spark development practices, and resource recruitment, to ensure high-quality, optimized solutions that met the client's needs.
  • Developed an ETL Spark framework using a test-driven development approach, led the development team to create reusable, modularized, and high-performance PySpark scripts, and automated data engineering workflows using Apache Airflow. Additionally, we created a Python API for SCD Type 2 to bridge the gap in Snowflake features, ensuring efficient data workflows and timely project delivery.

Senior Consultant

Capgemini
09.2011 - 11.2014
  • Designed and developed an enterprise data warehouse and reporting data mart, enabling efficient data storage and reporting capabilities for the organization.
  • Led an ETL team to build an Operational data Store that facilitated the loading of data into a Salesforce instance, streamlining data management and improving workflow efficiency.
  • Worked on a Master Data Management (MDM) project for a retail client's supply chain management, ensuring accurate and consistent data across systems and processes.
  • Built a proof of concept for a hybrid Big data project using a single cluster Cloudera, demonstrating the benefits of big data and highlighting opportunities for improved data management and analysis.
  • Gave budget suggestions for projects, outlining financial needs and forecasting.

Consultant

AMGEN Life Science
10.2010 - 01.2011

Build a data store and ETL mappings to load sales and IMS data. Worked with IMS, formulary, and Web Application data sources. Assisted the development team in modify existing Web applications.

Sr.Clinical Data Analyst

Massachusets Institute Of Technology
08.2010 - 10.2010
  • Clinical data analysis for Alzheimer’s disease. Dealt with data sources like LONI IDA retrieving clinical data and creating reports for clinical studies. Tested data prediction algorithms based on historical data.


Clinical Data Analyst

University Of Buffalo, Center For Excellence
05.2008 - 08.2010

Clinical data analysis for various projects at University of Buffalo using Mathematics and statistical tools. Optimized the process using correlation and regression analysis of various environmental factors.

Data Analyst

Sutherland Global Services
10.2006 - 08.2007

Experienced database administrator with a proven track record of efficiently loading data for call centers, ensuring accuracy, consistency, and accessibility of data. Expertise in utilizing tools such as SQL Server Integration Services (SSIS) and Talend to develop and implement efficient data loading processes.

Education

MBA - Business in Biotechnology

University of Buffalo
Buffalo,NY
05.2010

B.Tech - Industrial

Anna University
India
05.2006

Skills

  • AWS Data Analytics Ecosystem
  • Azure Data Analytics Ecosystem
  • AWS Machine learning Ecosystem
  • Data Lake -AWS LakeFormation, Apache HUDI, Iceberg and Delta
  • Big data-Spark programming for Streaming and batch, Hadoop, HIVE, Parquet and ORC formats
  • Python frameworks -Pandas, Airflow, AWS Boto3, AWS CDK,
  • Infrastructure as code - AWS Cloudformation, AWS CDK, Terraform
  • Container based - AWS ECS, AWS Lambda, AWS EKS
  • Machine learning - AWS Sagemaker, ML Flow, ML-Ops, Custom ML model implementation

Accomplishments

  • AWS GitHub Contributor: 3 Major Contributions
  • AWS Certified Solution Architect
  • Big data blogger

Certification

AWS certified Solutions Architect

Timeline

Sr. AWS Data Lab Architect

Amazon Web Services
01.2021 - Current

Sr. Manager

Accenture LLP
11.2014 - 01.2021

Senior Consultant

Capgemini
09.2011 - 11.2014

Consultant

AMGEN Life Science
10.2010 - 01.2011

Sr.Clinical Data Analyst

Massachusets Institute Of Technology
08.2010 - 10.2010

Clinical Data Analyst

University Of Buffalo, Center For Excellence
05.2008 - 08.2010

Data Analyst

Sutherland Global Services
10.2006 - 08.2007

MBA - Business in Biotechnology

University of Buffalo

B.Tech - Industrial

Anna University
John Cherian