Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Mohammed Imran Khan

Burlington,USA

Summary

Lead Data Engineer possessing significant proficiency in Python, PySpark, and Kubernetes, focused on enhancing data architecture and implementing real-time analytics solutions. With over nine years of experience in developing efficient ETL pipelines and promoting cross-functional collaboration to meet strategic goals. Proficient in utilizing advanced technologies, such as Apache Airflow and Spark SQL, to improve data accessibility and facilitate informed decision-making.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Lead Data Engineer

Personify Health Inc.
Burlington, MA
09.2023 - Current
  • Collaborate with the Product, Reporting, and Infrastructure teams to comprehend the requirements for data exposure and facilitate the deployment of the project to Production.
  • Accountable for the design and implementation of ETL/ELT pipelines for data loading utilizing Python and PySpark or Spark SQL and orchestrating these processes within Apache Airflow.
  • Tasked with leading the team and providing support to ensure that deliverables remain unaffected.
  • Engage in cross-team communication as necessary to ensure that deliverables and issues are addressed efficiently.
  • Designed and implemented robust ETL/ELT pipelines, enhancing data loading efficiency and enabling real-time analytics for strategic decision-making.
  • Analyzed data requirements across teams, facilitating seamless deployments and minimizing downtime during production launches.
  • Fostered strong partnerships with Product and Infrastructure teams, ensuring alignment on project goals and timely resolution of challenges.
  • Spearheaded the orchestration of data processes using Apache Airflow, achieving measurable improvements in workflow automation and reliability.
  • Conducted in-depth analysis of data workflows, optimizing ETL/ELT processes for enhanced data accuracy and streamlined reporting.
  • Implemented advanced data orchestration strategies using Apache Airflow, leading to substantial improvements in operational efficiency.

Senior Data Engineer

Virgin Pulse Inc.
Burlington, MA
11.2021 - 09.2023
  • Engineered a data ingestion platform using Python and Apache Airflow, enhancing ETL efficiency, resulting in timely insights for stakeholders and enhanced decision-making.
  • Collaborate with the Product, Reporting, and Infrastructure teams to comprehend the requirements and facilitate the project’s deployment.
  • Designed robust APIs in Java/Groovy, streamlining data access for applications and improving overall application performance.
  • Implemented Kubernetes for container management, optimizing deployment processes, and increasing system reliability across data pipelines.
  • Fostered a culture of teamwork and knowledge sharing, contributing to a collaborative environment that enhanced project delivery.

Data Engineer

Welltok Inc.
Burlington, MA
06.2018 - 10.2021
  • Spearheaded the development of a streamlined data pipeline using Databricks, improving data accessibility and reducing latency.
  • Enhanced old data pipeline architecture, resulting in significant reductions in data retrieval times and improved user satisfaction.
  • Developed robust APIs in Java/Groovy, facilitating seamless data integration and elevating application performance.
  • Implemented rigorous testing protocols for ETL modules, ensuring compliance with quality standards and enhancing system reliability.
  • Fostered strong collaboration across Product and Infrastructure teams, aligning project goals and expediting deployment timelines.

Cassandra Administrator

Tata Consultancy Services
Baltimore, MD
03.2016 - 06.2018
  • Managed Cassandra environments, which consist of 120, 9, and 9 nodes respectively, to ensure 24/7 availability, significantly reducing downtime and enhancing system reliability.
  • Designed and implemented data pipelines using Apache Spark and Scala, streamlining data transfer and improving processing efficiency.
  • Utilized Jenkins and Ansible for automated deployments, decreasing manual intervention and resulting in faster updates across the cluster.
  • Partnered with development teams to optimize database performance, leading to noticeable improvements in application response times.

Informatica Project Lead

Tata Consultancy Services
Baltimore, MD
03.2015 - 02.2016
  • Led cross-functional meetings with business stakeholders to clarify project requirements, ensuring alignment and fostering a shared vision.
  • Implemented advanced ETL processes using Informatica and Teradata, driving substantial improvements in data processing efficiency and accuracy.
  • Analyzed existing data workflows and introduced optimized Shell scripts, resulting in noticeable gains in operational speed and reduced overhead.
  • Developed and refined data models in collaboration with Data Architects, enhancing data integrity and supporting informed decision-making across teams.
  • Actively participated with the QA team during testing phases, addressing critical defects promptly to maintain project timelines and quality standards.

Education

Master of Science - Information Technology

University of Mumbai
Mumbai, India
08.2010

Bachelor of Science - Information Technology

University of Mumbai
Mumbai, India
07.2008

Skills

  • Python
  • PySpark
  • Spark Streaming
  • Spark SQL
  • Shell scripting
  • Cql
  • SQL
  • Snowflake
  • Cassandra
  • Postgres
  • Teradata
  • Oracle SQL Server
  • AWS
  • Apache Kafka
  • RabbitMQ
  • DSE
  • OpsCenter
  • DataBricks
  • Informatica PowerCenter
  • Talend
  • SSDT
  • Apache Airflow
  • Appworx
  • Splunk
  • Open Search
  • Kubernetes
  • Docker
  • GitHub
  • GitLab
  • Jenkins
  • ETL development
  • Kubernetes deployment
  • Data modeling
  • Performance optimization

Certification

  • AWS Certified Cloud Practitioner
  • DataStax Certified Professional on Apache Cassandra
  • Sun Certified Java Programmer

Timeline

Lead Data Engineer

Personify Health Inc.
09.2023 - Current

Senior Data Engineer

Virgin Pulse Inc.
11.2021 - 09.2023

Data Engineer

Welltok Inc.
06.2018 - 10.2021

Cassandra Administrator

Tata Consultancy Services
03.2016 - 06.2018

Informatica Project Lead

Tata Consultancy Services
03.2015 - 02.2016

Master of Science - Information Technology

University of Mumbai

Bachelor of Science - Information Technology

University of Mumbai
Mohammed Imran Khan