Summary

Overview

Work History

Skills

Timeline

Monika Bompelly

New Haven,CT

Summary

Transitioning from data-centric environment with focus on developing efficient data solutions and optimizing workflows. Skilled in data architecture, database management, SQL, and Python, with track record of enhancing data-driven decision-making processes. Seeking to apply these transferrable skills in new field, bringing consultative approach to solving complex problems and improving operational efficiency.

Overview

years of professional experience

Work History

Sr. Data Engineer/ Cloud Data Engineer

Tapestry Inc

08.2023 - Current

Involved in the complete Big Data flow of the application starting from data ingestion upstream to HDFS, processing the data in HDFS, and analyzing the data involved
Involved in developing a roadmap for migration of enterprise data from multiple data sources like SQL Server, and provider databases into S3 which serves as a centralized data hub across the organization
Loaded and transformed large sets of structured and semi-structured data from various downstream systems
Working knowledge of Data Build Tool (Dbt) with Snowflake and experience in writing SQL queries against snowflake
Developed ETL pipelines using Spark and Hive for performing various business-specific transformations
Responsible for analyzing the business requirement estimating the tasks and preparing the design documents for the existing and Teradata code for converting into hive/spark SQL
Worked with Sage Maker to build, train, and deploy machine learning models, incorporating predictive analytics into data workflows
Managed and stored large volumes of data in AWS S3, integrated with AWS Glue Catalog for metadata management
Migrated on-prem data warehouses to cloud environments and designed workflows to ensure data integrity and consistency during the migration
Worked on building real-time pipelines using Kafka and Spark Streaming
Worked closely with our data scientist teams and business consumers to shape the datasets as per the requirements
Automated the data pipeline to ETL all the Datasets along with full loads and incremental loads of data
Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau
Scheduled Airflow DAGs to run multiple Hive and Pig jobs, which independently run with time and data availability
Designed and deployed server less applications using AWS Lambda, automating backend processes and event-driven workflows to increase operational efficiency
Implemented data partitioning and optimized ETL performance using Spark SQL and Hive to reduce processing time for large datasets by 30%
Integrated AWS Lambda to automate real-time ETL jobs, reducing manual intervention and enabling event-driven data processing
Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations using AWS Athena
Integrated real-time data streaming using AWS Kinesis and MSK to ingest and process high-volume data streams for real-time analytics and alerting systems
Engineered large-scale data analytics solutions with AWS Redshift, performing complex SQL queries on massive datasets to drive business intelligence and decision-making
Hands-on experience managing security with AWS IAM, service roles, and KMS for data encryption
Used AWS Secrets Manager to securely manage credentials for accessing sensitive data across multiple AWS services
Created custom triggers and integrations with other AWS services, such as AWS S3 and Amazon SNS (Simple Notification Service), to build scalable and responsive applications
Loaded Json-styled documents in a NoSQL database like MongoDB and deployed the data in the cloud service Amazon Redshift
Responsible for using Flume sink to remove the date from the Flume channel and deposit it in a No-SQL database like MongoDB
Worked on building input adapters for data dumps from FTP Servers using Apache Spark
Wrote spark applications to perform operations like data inspection, cleaning, loading, and transforming large sets of structured and semi-structured data
Developed Spark application with Scala and Spark-SQL for testing and processing of data
Reporting the spark job stats, monitoring, and running data quality checks are made available for each dataset
Developed the Spark SQL logic which mimics the Teradata ETL logics and points the output Delta back to Newly Created Hive Tables and as well the existing TERADATA Dimensions, Facts, and Aggregated Tables
Ensuring data quality, reliability, and integrity across the data pipeline and maintaining a robust data governance framework
Technical Stack: AWS S3, AWS Redshift, Jenkins, GIT, Hadoop, Hive, Pig, Sqoop, Oozie, Spark, Scala, Airflow, Oracle, DB2, Salesforce, Mainframe, DataStage, Grafana, Rally, ServiceNow, Unix, DoM.

Sr. Data Engineer

Elevance

01.2022 - 07.2023

Developed and managed complex ETL pipelines using Apache Nifi, transforming and loading terabytes of data into AWS Redshift with minimal downtime
Automated data quality checks and validation processes using Python and SQL, reducing data errors by 40%
Developed ETL processes to extract, transform, and load data from various sources, including SQL Server, Oracle, and MongoDB, ensuring data accuracy and integrity
Collaborated with data scientists and analysts to develop data models and algorithms, leading to improved predictive analytics and business insights
Managed and optimized data storage solutions using AWS services such as S3, Redshift, and Glue, reducing storage costs by 25%
Implemented data governance frameworks and policies, ensuring compliance with GDPR and CCPA regulations
Built and maintained real-time data streaming applications using Kafka, enabling real-time analytics and decision-making
Utilized Docker to containerize data processing applications, ensuring consistency across different environments
Employed Kubernetes for orchestrating containerized data processing workloads, improving scalability and resource utilization
Participated in Agile Scrum ceremonies, contributing to sprint planning, daily stand-ups, and retrospectives to improve team productivity
Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PLSQL
Experience in designing and implementing Data Warehouse applications mainly using ETL tool Talend Data Fabric for Big data integration and data ingestion
Integrated data quality checks into CI/CD pipelines, ensuring data integrity and reliability in production
Led the design and implementation of scalable data pipelines using technologies such as Apache Spark, Apache Kafka, and Apache Nifi to efficiently process and analyze large volumes of data
Build application and database servers using AWS EC2, create AMIs, and use RDS for PostgreSQL
Carried Deployments and builds on various environments using the continuous integration tool Jenkins
Designed the project workflows/pipelines using Jenkins as a CI tool
Informatica Intelligent Cloud Services (IICS), Informatica Data Quality
Implemented Apache Hadoop ecosystem components, including HDFS, MapReduce, Hive, and HBase, to effectively manage and process extensive datasets
Automated data extraction processes from multiple sources, including RESTful APIs, databases, and flat files, reducing manual intervention by 50%
Solid knowledge of Data warehousing, Data Marts, Operational Data Store (ODS), Dimensional Data Modeling (Star Schema Modeling
Expertise in Data Architect, Data Modeling, Metadata, Data Migration, Data mining, Data Science
Evaluating Azure, Collibra, Alation, Informatica data catalog tools
Setting up self-service analytics process and standards using Power BI to utilize data assets
Involved in Linux shell scripts for business processes and loaded data from different systems into the HDFS
Implemented ETL processing, which consists of data transformation, data sourcing, mapping, conversion, and loading
Utilized Apache Spark and PySpark to process and analyze large datasets, achieving significant reductions in processing time from hours to minutes
Created interactive dashboards in Tableau to visualize key business metrics, empowering stakeholders with actionable insights
Developing and maintaining technical roadmap for Enterprise Modern Data Platform for different platform capabilities
Developed custom data visualizations in Power BI to illustrate complex data patterns and trends
Leveraged AWS S3, EC2, and EMR instances extensively for deploying and testing applications across various environments (DEV, QA, PROD)
Utilized Terraform to allow infrastructure to be expressed as code in building EC2, Lambda, RDS, and EMR
Built analytical warehouses in Snowflakes and queried data in staged files by referencing metadata columns in a staged file
Designed a Data Quality Framework to perform schema validation and data profiling on Spark (PySpark)
Utilized Pandas API to put the data in a time series and tabular form for timestamp data manipulation and retrieval to handle time series data and do data manipulation
Using API interface Alation to query data and managed tables
Implemented Spark Structured streaming to consume real-time data, build feature calculations from various sources like Data Lake and Snowflake, and produce them back to Kafka
PLSQL resource on the project I developed an abstraction layer of complex views to support backward compatibility for legacy data warehouse data consumers
Extensively worked with Avro and Parquet files and converted the data from either format
Parsed semi-structured JSON data was converted to Parquet using data frames in PySpark
Developed a Python Script to load the CSV files into the S3 buckets, created AWS S3 buckets, performed folder management in each bucket, and managed logs and objects within each bucket
Created Hive DDL on Parquet and Avro data files residing in both HDFS and S3 buckets
Configured Glue Dev Endpoints to point Glue Job to specify EMR cluster or EC2 instance
Technical Stack: Python, SQL, ETL, Apache Nifi, AWS Redshift, SQL Server, Oracle, MongoDB, S3, Glue, GDPR, CCPA, Kafka, Docker, Kubernetes, Agile, Scrum, CI/CD, Spark, EC2, AMIs, RDS, PostgreSQL, Jenkins, CI, MapReduce, Hive, HBase, RESTful APIs, Flat files, HDFS, PySpark, Tableau, Power BI, EMR, Lambda, Snowflakes, Pandas API, Data Lake, JSON, Parquet, CSV, S3 buckets, Avro

Sr. Data Engineer

Nike

07.2021 - 12.2022

Automated ETL processes using PySpark Data Frame APIs, reducing manual intervention and ensuring data consistency and accuracy
Integrated Azure Databricks into end-to-end ETL pipelines, facilitating seamless data extraction, transformation, and loading
Implemented complex data transformations using Spark RDDs, DataFrames, and Spark SQL to meet specific business requirements
Developed real-time data processing applications using Spark Streaming, capable of handling high-velocity data streams
Developed and implemented data security and privacy solutions, including encryption and access control, to safeguard sensitive healthcare data stored in Azure
Enhanced search performance by implementing and maintaining ElasticSearch clusters, reducing query response time by 30%
Ensured high availability and fault tolerance by managing ElasticSearch cluster health and scaling
Designed and implemented PostgreSQL database schemas and table structures based on normalized data models and relational database principles
Created interactive and insightful dashboards and reports in Power BI, translating complex data sets into visually compelling insights for data-driven decision-making
Designed efficient HBase schemas for improved data retrieval and storage, decreasing latency and boosting read/write performance
Leveraged expertise in Azure Data Factory for proficient data integration and transformation, optimizing processes for enhanced efficiency
Managed Azure Cosmos DB for globally distributed, highly available, and secure NoSQL databases, ensuring optimal performance and data integrity
Created end-to-end solutions for ETL transformation jobs involving Informatica workflows and mappings
Demonstrated extensive experience in ETL tools, including Teradata Utilities, Informatica, and Oracle, ensuring efficient and reliable data extraction, transformation, and loading processes
Integrated, transformed, and loaded data from various sources using Spark ETL pipelines, ensuring data integrity and consistency
Seamlessly integrated HBase with data processing pipelines, facilitating real-time analytics and data ingestion
Utilized Python, including pandas and numpy packages, along with PowerBI to create various data visualizations, while also performing data cleaning, feature scaling, and feature engineering tasks
Developed machine learning models such as Logistic Regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, and Scikit-learn in Python
Designed and coordinated with the Data Science team in implementing advanced analytical models in Hadoop Cluster over large datasets, contributing to efficient data workflows
Automated the provisioning of Azure resources using Terraform scripts, ensuring consistent and repeatable environment setups
Managed infrastructure changes using Terraform, enabling version-controlled and auditable infrastructure deployments
Implemented CI/CD pipelines with Jenkins for automated testing and deployment of ETL processes, reducing manual errors
Integrated CI/CD workflows with GitLab for continuous integration and delivery, enhancing the efficiency of development cycles
Leveraged Git for version control to manage code changes and collaborate on ETL development, ensuring code quality
Coordinated with teams using GitLab repositories, facilitating collaborative development and code reviews
Configured Jenkins pipelines to automate the testing and deployment of data integration jobs, improving release management
Automated deployments by integrating Jenkins with Azure and containerized ETL workflows with Docker for consistent environments across all stages
Utilized Docker to deploy scalable and reproducible environments for data processing applications
Deployed containerized data processing applications on Kubernetes clusters for enhanced scalability and reliability
Managed Kubernetes deployments using Helm to simplify the deployment and scaling of ETL pipelines
Technical Stack: Azure, Azure Data Factory, Azure CosmosDB, ETL, Informatica, PySpark, Azure HDInsight, Apache Spark, Hadoop, Spark-SQL, Scikit-learn, Pandas, NumPy, PostgreSQL, MySQL, Python, Scala, Power BI, SQL.

Data Engineer

United Health Group

04.2019 - 06.2021

Developed and maintained data pipelines in azure datafactory, integrating data from manufacturing, sales, and customer service for comprehensive analytics
Managed azure datalake storage solutions for scalable and secure data storage, enabling efficient data access and analysis across global teams
Utilized azure databricks for big data processing and analytics, applying machine learning models to predict vehicle performance and maintenance needs
Automated deployment processes using jenkins and ansible, improving the efficiency and reliability of data infrastructure provisioning
Wrote and maintained shell scripts to automate routine data management tasks, enhancing operational efficiency and reducing manual errors
Deployed and maintained ssis packages across multiple environments, ensuring smooth data flow operations between development, staging, and production systems
Configured azure service bus and event hub for real-time data ingestion and event streaming, facilitating immediate insights into manufacturing and operational data
Administered azure sql databases and cosmos db, optimizing performance and ensuring high availability for critical automotive data applications
Deployed containerized applications using azure kubernetes service (aks) and azure container registry (acr), supporting scalable and resilient data services
Secured sensitive data using azure key vault, implementing best practices for managing secrets, keys, and certificates
Managed azure vm creation and configuration, ensuring optimized resource utilization for data processing and analysis workloads
Maintained infrastructure as code (iac) using yaml templates, streamlining deployment and management of azure resources
Configured and maintained weblogic and azure webapp environments, supporting web-based applications and services for internal and customer-facing portals
Wrote efficient, scalable code in python for data processing and automation tasks, contributing to the development of predictive analytics models
Managed source code and version control using git, ensuring code integrity and facilitating team collaboration
Coordinated project tasks and tracked progress using jira, enhancing project visibility and team productivity
Configured ssrs report subscriptions and alerts, ensuring timely delivery of reports via email or shared network drives to end-user
Implemented data governance and compliance measures, aligning data management practices with automotive industry standards and regulations
Conducted thorough testing and validation of data pipelines and analytics models, ensuring accuracy and reliability of insights provided to decision-makers
Technical Stack: Azure, Red hat linux, Jenkins, Ansible, Shell scripting, Azure datalake, Azure datafactory, Azure ad, Azure service bus, Azure sql, cosmos db, Log analytics, aks, event hub, service bus, key vault, app insights, Azure vm creation, Acr, Azure function app, Azure webapp, Azure sql, and Azure sql mi, ssh, yaml, WebLogic, Python, Azure devops, git, maven, jira.

Data Engineer

04.2015 - 03.2019

Implemented data pipelines on AWS Glue to effectively extract, transform, and load various datasets for Chevron's analytics, improving operational and decision-making insights
Responsible for the execution of big data analytics, predictive analytics, and machine learning initiatives
Built real-time data pipelines by developing Kafka producers and Spark Streaming applications for processing large-scale data from oil and gas operations
Monitored Spark jobs using the UI interface (Name Node Manager, Resource Manager ETS) in AWS
Utilized AWS services with a focus on big data architecture, analytics, enterprise data warehouses, and business intelligence solutions
Experience in AWS services like EC2, EMR, DynamoDB, Athena, and Redshift
Automated data workflows using Python and Apache Airflow, resulting in increased efficiency and reduced manual errors
Developed Spark SQL scripts using PySpark to perform transformations and actions on Data Frames and Data Sets in Spark for faster data processing
Created data pipelines for extracting, transforming, and loading data from various sources, including internal and external APIs
Conducted performance tuning and optimization of SQL queries on AWS Redshift to enhance data processing efficiency
Developed Spark scripts using Python on AWS EMR for data aggregation, cleansing, and mining
Developed and maintained data orchestration workflows using AWS Step Functions to manage complex ETL tasks and dependencies
Worked together with data scientists to enable real-time model inference through SQS-triggered Lambda functions and to run scripts in response to events in DynamoDB and S3
Collaborated with cross-functional teams to understand business requirements and translate them into actionable Tableau visualizations
Proficient in using Python for DynamoDB interactions, including Boto3 library for seamless integration
Implemented CRUD operations on DynamoDB tables using Python scripts, ensuring data consistency
Generating reports using Python as per the business requirement and create visualization
Participate in the design, build and deployment of NoSQL implementations like MongoDB
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud
Extensive code reviewing using GitHub pull requests, improved code quality, also conducted meetings among Team
Managed and processed large datasets using Hadoop MapReduce, improving data processing efficiency
Developed scripts to migrate data from proprietary database to Postgres SQL
Followed Agile Methodologies and SCRUM Process
Technical Stack: Python, Django, HTML5, CSS, Bootstrap, jQuery, JSON, JavaScript, PostgreSQL, MongoDB, Ansible, MySQL, Google Cloud, Amazon AWS S3, Bugzilla, JIRA, Hadoop, Hive, Apache Airflow

SQL Developer

09.2013 - 03.2015

Involved in the Installation and Configuration of SQL Server 2008 and SQL Server 2012 with the latest Service Packs
Used DDL and DML for writing Triggers, Stored procedures, and Data manipulation
Created Ad Hoc and Parameterized reports using SQL Server Reporting Services (SSRS)
Used performance point services, SSRS, Excel as the reporting tools and wrote the expressions in SSRS wherever necessary
Created reports from OLAP, sub reports, bar charts and matrix reports using SSRS
Deployed the SSRS reports in Microsoft Office SharePoint portal server (MOSS) 2012
Designed and developed SSIS Packages to import and export data from MS Excel, SQL Server, and Flat files
Involved in the development of complex mappings using SSIS to transform and load the data from Oracle into the SQL 2008 R2/2012 Server target staging database
Created Linked Servers to connect OLE DB data sources and providers participated in designing a data warehouse to store all information from OLTP to Staging and Staging to Enterprise data warehouse to do better analysis
Conducted and automated the ETL operations to Extract data from multiple data sources, transform inconsistent and missing data to consistent and reliable data, and finally load it into the multi-dimensional data warehouse
Developed T-SQL programs required to retrieve data from data repository using cursors and exception handling and created T-SQL scripts to monitor deadlocks
Developed SQL queries and PL/SQL procedures in Oracle database for the Application
Modified the existing Universe and created new Universe against Oracle database as per the reporting requirement to add new features to the reports
Documented Design Documents for reports to provide detailed design and explanation of the reports and documented Unit Test documents to evaluate and validate reports.

Skills

Programming: Python, Scala, Java, Golang
Cloud Platforms: AWS (EMR, S3, Glue, Redshift), Azure (Data Lake, Databricks), Google Cloud Platform
Big Data: Hadoop (HDFS, Hive, Pig, Spark), Apache Kafka, PySpark, MapReduce
ETL Tools: Talend, Informatica, Microsoft Integration Services, SnowSQL
Databases: SQL Server, NoSQL (DynamoDB, MongoDB, Cassandra)
Data Visualization: Tableau, Power BI

Infrastructure as Code: Terraform
API Development: JWT, OAuth2, API keys
CI/CD: Jenkins, Docker, Bitbucket, Git
Testing Tools: Apache JMeter, QuerySurge
Machine Learning & AI: TensorFlow, PyTorch
Methodologies: Agile, Scrum, Test-Driven Development (TDD)

Timeline

Sr. Data Engineer/ Cloud Data Engineer

Tapestry Inc

08.2023 - Current

Sr. Data Engineer

Elevance

01.2022 - 07.2023

Sr. Data Engineer

Nike

07.2021 - 12.2022

Data Engineer

United Health Group

04.2019 - 06.2021

Data Engineer

04.2015 - 03.2019

SQL Developer

09.2013 - 03.2015

Monika Bompelly

Summary

Overview

Work History

Sr. Data Engineer/ Cloud Data Engineer

Sr. Data Engineer

Sr. Data Engineer

Data Engineer

Data Engineer

SQL Developer

Skills

Timeline

Sr. Data Engineer/ Cloud Data Engineer

Sr. Data Engineer

Sr. Data Engineer

Data Engineer

Data Engineer

SQL Developer

Similar Profiles

Joey HuntJoey Hunt

Floriana Rodica NicaFloriana Rodica Nica

Lydia EvansLydia Evans

Adam MalachiAdam Malachi

Daniel WernerDaniel Werner