Optimized the data processing workflows using Dataproc, BigQuery reducing the processing time by 30%
Built and deployed multiple ETL workflows cleaning wholesaler data acquired from IQVIA using GCP
Storage,dataproc, BigQuery and composer
Created Pyspark scripts in AutoML to test the models Efficiency and deployed them to production using
Composer and oozie
Designed and Built the ML Model Deployment and Monitoring solutions in Production
Experience with data validation, Score Model validation, ML training tests, model staleness tests, model
performance tests, integration tests, and unit tests
Exposure to Spark ML algorithms and Pyspark for in-memory computations for larger datasets
Migrated over 50 ML models from On-prem Hadoop to GCP cloud Prod Environment
Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas &
Views for OLAP and Operational data store (ODS) applications
Implemented Containerization & Orchestration for CI/CD using Docker - images & containers
Provided weekend on-call support ensuring timely delivery of model scores to business teams
Data Engineer
Amgen Inc
Tampa, FL
07.2022 - 06.2023
Worked on multiple Master Data Management (MDM), Multichannel Marketing (MCM) and Sales data
workflows to streamline the data exported to Redshift from SQL, Excel, CSV file formats
Developed pipelines for SAP, ADC, Veeva CRM data for healthcare products and customers using
databricks, validated data for 120 countries using Amazon S3
Hands-on experience in setting up workflows using Airflow for over 10 ETL pipelines in Databricks
Utilized Boto3 to configure outbound file into the data reporting layer using Airflow
Successfully On-boarded Google Analytics 360 Source system on to Databricks and validated it
Data Science Intern
Technologies LLC
Hartford, CT
06.2021 - 12.2021
Built ETL pipelines using Pentaho and PostgreSQL standardizing the metrics to be used in PowerBI reports
Drove data migration initiative to handle client requests better, lowering required manpower by 43% and
TAT of request handling by 30%; Awarded Most Prominent Newcomer Award by client for business impact
Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas & Views
for OLAP and Operational data store (ODS) applications.
Data Engineer
Tech Mahindra Limited
Hyderabad, IN
11.2015 - 11.2019
Performed root cause analysis using SQL and Python
Partnered with cross-functional teams to design and improve the billing accuracy by 27%
Designed and maintained Microsoft SQL database to enable faster data processing using SSIS
Created automated Tableau reports for the management based on customer KPIs
Used Bash Scripts to build analytics pipelines in the cloud and work with data stored across many files
Implemented and scheduled data engineering workflows using Airflow in Python
Successfully delivered POC on sales trend using Pyspark on databricks and impactful visualizations
Awards: Awarded Bravo twice at Tech Mahindra for outstanding performance delivered during the years
2017, 2019
Databricks Certified Professional with 6 years of experience proficient in Big Data, Cloud, SQL, Python and Spark
EXPERIENCE
Timeline
GCP Data Engineer
StaffWorxs, Walmart
06.2023 - Current
Data Engineer
Amgen Inc
07.2022 - 06.2023
Data Science Intern
Technologies LLC
06.2021 - 12.2021
Data Engineer
Tech Mahindra Limited
11.2015 - 11.2019
MS - Busineess Analytics
University of Connecticut School of Business
Bachelors in Engineering - undefined
JNTU College of Engineering
Awards: Awarded Bravo twice at Tech Mahindra for outstanding performance delivered during the years
2017, 2019
Databricks Certified Professional with 6 years of experience proficient in Big Data, Cloud, SQL, Python and Spark
EXPERIENCE