Summary
Overview
Work History
Education
Skills
Timeline
Generic

Mounika C

Mechanicsburg,PA

Summary

Over 5 years of experience as a Data Engineer with expertise in Python, ETL, Informatica, Spark, Hadoop Ecosystem, AWS, and Snowflake. Extensive experience deploying cloud-based applications using Amazon Web Services such as Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, and DynamoDB. Developing, implementation and optimization of data plumbing systems and ETL processes. Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena. Experience in Python programming in application development over several years. Preparing Python scripts by using Python, pandas, NumPy and SQL Alchemy for ETL. Experience in analyzing data using HiveQL, HBase and custom Map Reduce programs in Python. Good knowledge of Joins, group and aggregation concepts and resolved performance issues in Hive and Spark. Experience in working on Data modelling (Dimensional & Relational) concepts like Star-Schema Modelling, Schema Modelling, Fact and Dimension tables. Experience in working with NoSQL data stores like HBase, DynamoDB. Experience in writing test cases, static code analysis and CICD process using Git, Jenkins. Experience in Object Oriented Analysis Design (OOAD) and development.

Overview

6
6
years of professional experience

Work History

Big Data Engineer

Wells Fargo
10.2021 - Current
  • Implemented and maintained large-scale ETL processes using Apache Spark ecosystem and lambda architecture to meet the needs of a global client base
  • Worked with Apache Spark and its components and used Sqoop for big data processing
  • Constructed ETL pipelines for Snowflake ingestion and exploited Spark Streaming and Apache Kafka for live stream data handling
  • Used Apache HTTP libraries to send and consume data from Restful API's which enriches data based on trained Machine Learning Model
  • Improved performance of existing algorithms using Spark Context, Spark SQL
  • Implemented scalable ETL processes to automate data integration from diverse sources, managing both structured and unstructured data effectively
  • Oversaw optimization of PostgreSQL database, ensuring data availability, consistency, and integrity
  • Collaborated closely with data science and analytics teams to deploy a cloud-based data warehouse on Amazon Redshift
  • Engineered a real-time data processing system using Apache Kafka for immediate analysis of customer interactions
  • Established a comprehensive data lake utilizing Hadoop and Hive, enabling centralized raw data storage
  • Developed resilient and scalable data pipelines with Apache Airflow, streamlining data flow with robust error detection and automatic retry mechanisms
  • Applied multiple programming languages including Python and Scala for building modern, efficient data pipelines
  • Writing python unit test code using pytest
  • Analyzing logs and debugging AirFlow jobs
  • Creating ETL using dataflow for batch transformation
  • Data analysis for different reports by writing sql queries.

Data Engineer

State Farm
01.2020 - 12.2020
  • Design and document the new architecture and development process to convert existing ETL pipeline into Hadoop based systems
  • Developed ETL Processes in AWS Glue to migrate data from external sources and S3 files into AWS Redshift
  • Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions
  • Day-to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in Snowflake
  • Designing and implementing data model using advanced SQL for storage, retrieving, and manipulating the data
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
  • Created AWS Lambda functions and assigned IAM roles to schedule python scripts using CloudWatch Triggers to support the infrastructure needs (SQS, Event Bridge, SNS)
  • Developed a python script to hit REST API's and extract data to AWS S3
  • Conducted ETL Data Integration, Cleansing, and Transformations using AWS glue Spark script
  • Developed a generic framework using Spark for processing/Flatten JSON data that is re-used by various applications within the enterprise
  • Developed Spark based self-service architecture for downstream teams to join tables and data transformations.

Big Data Developer

Walgreens
06.2018 - 12.2019
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters
  • Written shell scripts to run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled JSON Data
  • Involved in business analysis and technical design sessions with business and technical staff to develop requirements document and ETL design specifications
  • Responsible for design, development, Data Modelling, of Spark SQL Scripts based on Functional Specifications
  • Designed and developed extract, transform, and load (ETL) mappings, procedures, and schedules, following the standard development lifecycle
  • Developed Autosys scripts to schedule the Kafka streaming and batch job
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Worked closely with Quality Assurance, Operations and Production support group to devise the test plans, answer questions, and solve any data or processing issues
  • Developing scripts in Big Query and connecting it to reporting tools
  • Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Spark core, Spark SQL, Sqoop, Hive and NoSQL databases
  • Worked in writing Spark SQL scripts for optimizing the query performance
  • Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE
  • Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data
  • Implemented Partitioning, Data Modelling, Dynamic Partitions and Buckets in HIVE for efficient data access.

Education

Skills

  • Airflow, Tableau, SBT, Maven, IntelliJ, PyCharm, Eclipse, GIT
  • Python, Scala, SQL, NO-SQL
  • Jenkins, GitLab CI/CD
  • Git, GitHub, Bitbucket
  • Talend, Informatica, Apache NiFi

Timeline

Big Data Engineer

Wells Fargo
10.2021 - Current

Data Engineer

State Farm
01.2020 - 12.2020

Big Data Developer

Walgreens
06.2018 - 12.2019

Mounika C