Summary

Overview

Work History

Education

Skills

Accomplishments

Certification

Timeline

John Cherian

ELLICOTT CITY,Maryland

Summary

With over 16 years of experience providing solutions for Enterprise applications, Data Engineering, and Machine Learning for Fortune 100 companies, I have become an expert in AWS data architecture, Data Engineering, and Machine Learning. My skills in Spark, Python, AWS Data analytics, Machine Learning, and AWS have helped me consult clients in Technical architecture, Technology stack selection, Development patterns and standards, Minimal viable product methodology, and building and coaching in-house teams. I am proud to be the main contributor for Spark on AWS Lambda open-source, Data on EKS, AWS ECS streaming and AWS Sagemaker SDK on AWS GitHub.

Overview

years of professional experience

Certification

Work History

Sr. AWS Data Lab Architect

Amazon Web Services

District of Columbia

01.2021 - Current

Led and built a successful open-source GitHub project with 12+ contributors to bring low-cost Spark integration to AWS customers, resulting in the project being integrated to AWS Lambda product team. Led development effort, collaborated with other contributors, and ensured high standards of quality and performance. Experienced in AWS technologies including Lambda, S3, and DynamoDB, with a passion for finding innovative solutions to complex problems.
Successfully led a team of 5+ members for an AWS internal project, Apache Nifi on AWS EKS, that enabled cross-cloud access between AWS and Azure through a secure and dedicated private connection.
Collaborated with CTOs and CDOs of AWS customers to develop data strategies and production paths, resulting in successful implementation and utilization of Data Lake, Data Mesh, and Machine Learning solutions.
Built low-latency and batch pipelines utilizing AWS Glue, AWS EMR, AWS Lambda, and Pyspark to meet diverse organizational needs.
Developed and implemented data strategy and architecture for AWS Data Mesh and Central/Federated governance patterns to unlock the full potential of data while ensuring quality, consistency, and compliance.
Designed and implemented secure and scalable machine learning solutions for AWS clients, working closely with customers to translate business needs into technical requirements on the AWS platform.
Expertise in end-to-end MLOps pipeline design and implementation, deploying machine learning models into production environments in a rapid and continuous manner, while optimizing performance and cost. Implemented security and governance policies to ensure data privacy, compliance, and risk management for high-quality machine learning solutions.
Actively contributed to AWS Github open-source projects and big data blogs,
Spark on AWS Lambda-https://github.com/aws-samples/spark-on-aws-lambda
Contributor for Data on EKS on AWS labs https://awslabs.github.io/data-on-eks/docs/streaming-platforms/nifi
AWS ECS streaming -https://github.com/aws-samples/aws-ecs-data-streaming
Authored a blog on Low-cost Spark on AWS Lambda, which was later adopted by the AWS Lambda product team, demonstrating a deep understanding of data and big data technologies on the AWS platform.

Sr. Manager

Accenture LLP

New York, NY

11.2014 - 01.2021

Led AWS data strategy project, addressing critical areas such as data management and operation governance, data lake strategy and use cases, data security, audit policies, and governance, and data migration from on-premise to AWS data lake.
Collaborated with clients to select most suitable storage and compute engine for AWS data lake based on their use cases and logical/physical separation of phases of data based on zones.
Developed strategies for data lake metadata management, data classification based on latency requirements, data quality requirements assessment, and data flow notification strategy. Provided clients with a robust and effective AWS data strategy that met their specific needs and requirements.
Successfully led several delivery projects on AWS/Azure.
Assessed the current architecture of on-premise systems and developed a high-level roadmap to migrate to the Snowflake database, ensuring seamless migration and efficient data management for the client.
Enforced Python development standards, development design patterns, spark development practices, and resource recruitment, to ensure high-quality, optimized solutions that met the client's needs.
Developed an ETL Spark framework using a test-driven development approach, led the development team to create reusable, modularized, and high-performance PySpark scripts, and automated data engineering workflows using Apache Airflow. Additionally, we created a Python API for SCD Type 2 to bridge the gap in Snowflake features, ensuring efficient data workflows and timely project delivery.

Senior Consultant

Capgemini

Danbury, CT

09.2011 - 11.2014

Designed and developed an enterprise data warehouse and reporting data mart, enabling efficient data storage and reporting capabilities for the organization.
Led an ETL team to build an Operational data Store that facilitated the loading of data into a Salesforce instance, streamlining data management and improving workflow efficiency.
Worked on a Master Data Management (MDM) project for a retail client's supply chain management, ensuring accurate and consistent data across systems and processes.
Built a proof of concept for a hybrid Big data project using a single cluster Cloudera, demonstrating the benefits of big data and highlighting opportunities for improved data management and analysis.
Gave budget suggestions for projects, outlining financial needs and forecasting.

Consultant

AMGEN Life Science

Greenwich, CT

10.2010 - 01.2011

Build a data store and ETL mappings to load sales and IMS data. Worked with IMS, formulary, and Web Application data sources. Assisted the development team in modify existing Web applications.

Sr.Clinical Data Analyst

Massachusets Institute Of Technology

Cambridge, MA

08.2010 - 10.2010

Clinical data analysis for Alzheimer’s disease. Dealt with data sources like LONI IDA retrieving clinical data and creating reports for clinical studies. Tested data prediction algorithms based on historical data.

Clinical Data Analyst

University Of Buffalo, Center For Excellence

Buffalo, NY

05.2008 - 08.2010

Clinical data analysis for various projects at University of Buffalo using Mathematics and statistical tools. Optimized the process using correlation and regression analysis of various environmental factors.

Data Analyst

Sutherland Global Services

10.2006 - 08.2007

Experienced database administrator with a proven track record of efficiently loading data for call centers, ensuring accuracy, consistency, and accessibility of data. Expertise in utilizing tools such as SQL Server Integration Services (SSIS) and Talend to develop and implement efficient data loading processes.

Education

MBA - Business in Biotechnology

University of Buffalo

Buffalo,NY

05.2010

B.Tech - Industrial

Anna University

India

05.2006

Skills

AWS Data Analytics Ecosystem
Azure Data Analytics Ecosystem
AWS Machine learning Ecosystem
Data Lake -AWS LakeFormation, Apache HUDI, Iceberg and Delta
Big data-Spark programming for Streaming and batch, Hadoop, HIVE, Parquet and ORC formats

Python frameworks -Pandas, Airflow, AWS Boto3, AWS CDK,
Infrastructure as code - AWS Cloudformation, AWS CDK, Terraform
Container based - AWS ECS, AWS Lambda, AWS EKS
Machine learning - AWS Sagemaker, ML Flow, ML-Ops, Custom ML model implementation

Accomplishments

AWS GitHub Contributor: 3 Major Contributions
AWS Certified Solution Architect
Big data blogger

Certification

AWS certified Solutions Architect

Timeline

Sr. AWS Data Lab Architect

Amazon Web Services

01.2021 - Current

Sr. Manager

Accenture LLP

11.2014 - 01.2021

Senior Consultant

Capgemini

09.2011 - 11.2014

Consultant

AMGEN Life Science

10.2010 - 01.2011

Sr.Clinical Data Analyst

Massachusets Institute Of Technology

08.2010 - 10.2010

Clinical Data Analyst

University Of Buffalo, Center For Excellence

05.2008 - 08.2010

Data Analyst

Sutherland Global Services

10.2006 - 08.2007

MBA - Business in Biotechnology

University of Buffalo

B.Tech - Industrial

Anna University

John Cherian

Summary

Overview

Work History

Sr. AWS Data Lab Architect

Sr. Manager

Senior Consultant

Consultant

Sr.Clinical Data Analyst

Clinical Data Analyst

Data Analyst

Education

MBA - Business in Biotechnology

B.Tech - Industrial

Skills

Accomplishments

Certification

Timeline

Sr. AWS Data Lab Architect

Sr. Manager

Senior Consultant

Consultant

Sr.Clinical Data Analyst

Clinical Data Analyst

Data Analyst

MBA - Business in Biotechnology

B.Tech - Industrial

Similar Profiles

Karan KapoorKaran Kapoor

Alejandro MatheusAlejandro Matheus

Chetan KumarChetan Kumar

Venkat BobbiliVenkat Bobbili