Experienced, result-oriented, resourceful and problem-solving Data Engineer with 8 years of diverse experience in Information Technology field, includes Development and Implementation of various applications in Cloud environments in Storage, Querying and Processing data.
Responsive expert experienced in monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.
Overview
8
8
years of professional experience
1
1
Certification
Work History
Senior Data Engineer
Verana Health
Charlotte, NC
10.2021 - Current
Led the design and implementation of end-to-end scalable, secure cloud architecture data pipeline using Databricks with AWS resulting in a 40% improvement in data processing efficiency.
Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions that align with healthcare data compliance standards.
Led the integration of diverse data sources into Databricks, utilizing Unity Catalog to manage metadata and access controls, ensuring seamless data availability for analytics and reporting across the organization.
Automated ETL workflows using Apache Airflow and Databricks, resulting in a 25% improvement in data availability and freshness for analytics.
Conducted performance tuning and optimization of Databricks clusters to enhance data processing speed and efficiency.
Crafted Spark generic User-Defined Functions (UDFs) for executing record-level business logic operations.
Formulated an Event-driven pipeline leveraging SNS, SQS, AWS Lambda, Glue, and AWS Step Functions.
Orchestrated seamless integration of Databricks notebooks and jobs into CI/CD pipelines for automated testing and deployment.
Implemented dependency management strategies within Airflow to handle task dependencies effectively, ensuring a coherent and orderly execution sequence for Databricks jobs.
Senior Data Engineer
Nike
Beaverton, OR
04.2020 - 10.2021
Worked with APLA Enterprise Data Analytics team on data ingestion, transformation, and consumption views for Nike Direct forecast across the APLA (Asia Pacific Latin America) data built on AWS Cloud using EMR, S3, Lambda, Pyspark, Hive, Airflow and Snowflake.
Integrated various source systems and analyzed data to support pre channel and post channel sales transformation.
Engineered a robust data pipeline for ingestion, aggregation, and loading of consumer response data into Hive external tables on AWS S3, subsequently publishing the data to Snowflake for Tableau dashboard data sources.
Demonstrated proficiency in consuming near real-time data using Spark Streaming in conjunction with Kafka as a high-throughput data pipeline system.
Applied advanced skills in handling JSON datasets, crafting custom Python functions for parsing through JSON data using Spark.
Innovated by constructing an AWS Lambda function with boto3, automating the de-registration of unused AMIs across all application regions to optimize EC2 resource costs.
Showcased expertise in creating, debugging, scheduling, and monitoring jobs using Airflow for ETL batch processing, ensuring seamless loading into Snowflake for analytical processes.
Mastered the orchestration of job schedules through Airflow, including the creation of custom operators for enhanced flexibility and efficiency.
Data Engineer
Aetna
Hartford, CT
05.2018 - 04.2020
Worked with data stewardship and analytics teams to migrate an existing on-prem data pipelines to GCP using cloud native tools such as GCS Bucket, G - Cloud functions, Cloud dataflow, Pub/Sub cloud shell, gsutil , BQ command line utilities, Cloud Composer, Pyspark, Python, Dataproc and Big Query.
Expertise in handling large datasets using Partitions, Spark in Memory capabilities, efficient Joins, Transformations and other during ingestion process itself.
Developed generic Python script for data validation between source file and BigQuery tables along with maintains of archival process in GCS bucket.
Hands on experience in building and architecting multiple data pipelines, end to end ETL and ELT process for Data ingestion, transformation in GCP.
Created BigQuery authorized views for row level security or exposing the data to other teams.
Worked with cloud composer the run end to end data pipeline to schedule jobs and dependencies.
Big Data Engineer
Bank of America
Charlotte, NC
02.2016 - 05.2018
Worked with Wholesale loss Forecasting team that comes under Global Risk Analytics platform of BOA to support CCAR (Comprehensive Capital Analysis & Review) cycles which is a part of Federal regulation.
Developed Pyspark scripts to reduce the costs of organization by 30% and migrating the legacy systems from Teradata, Oracle to build Data Lake in Hadoop.
Loading the data from the different Data sources like (Teradata and Oracle) into HDFS using Sqoop and load into Hive tables, which are partitioned.
Integrated existing code logic in HiveQL to SparkSQL applications for data transformation and aggregation and write it to hive table.
Implemented dimensional Data Modeling to deliver Multi-Dimensional STAR, Snowflake schemas by normalizing the dimension tables as appropriate in Data Lake.
Extensively worked on Impala for querying hive tables for low latency and given to end users.
Developed application specific common utilities in Python to perform Data Quality (DQ) checks on data before being used by subsequent process or published to downstream users.
Developed oozie workflows and sub workflows to orchestrate the Sqoop scripts, hive queries, Spark scripts to automate the ETL process.
Education
Master of Science - Computer Science
Monmouth University
West Long Branch, NJ
12.2015
Bachelor of Technology - Computer Science And Engineering