Summary

Overview

Work History

Education

Skills

Timeline

Monika Chattu

DATA ENGINEER

Memphis,Tennessee

Summary

Nearly 7 years of professional experience in IT data analytics projects, specializing in designing and developing on-premises ETL solutions and modernizing them into AWS cloud-based architectures. Proficient in leveraging AWS native services such as AWS Glue for ETL workflows, Amazon Redshift for data warehousing, and Amazon S3 for scalable storage. Well-versed in orchestration tools like AWS Step Functions and AWS Data Pipeline for seamless workflow automation. Extensive experience in building and optimizing data pipelines using Pentaho to facilitate ETL processes, migrating data from MariaDB and flat files to AWS Cloud Services. Skilled in implementing serverless ETL solutions using AWS Lambda and processing large-scale datasets with AWS EMR (Elastic MapReduce). Hands-on experience with AWS Database Migration Service (DMS) and Schema Conversion Tool (SCT) for seamless data migration to AWS environments. Knowledgeable in AWS Batch for efficient batch processing of large datasets. Designed and built real-time data streaming solutions utilizing AWS Kinesis and developed data warehousing solutions on AWS using Amazon Redshift. Optimized Amazon S3 storage strategies for cost-effectiveness and improved data retrieval performance. Strong understanding of data modeling principles, including Fact and Dimension tables, as well as Snowflake and Star Schema modeling techniques. Extensive experience in data warehousing and data mart development using distributed SQL technologies such as Hive SQL, MySQL, and MS SQL. In-depth experience working with AWS services like AWS Lambda, AWS Glue, and AWS SDKs to develop and optimize data solutions. Expertise in writing SQL-based stored procedures, functions, and complex queries for database design and performance optimization. Comprehensive experience in the entire software development lifecycle (SDLC), from requirements analysis and design to development, testing, and deployment. Documented best practices, optimization strategies, and performance tuning methodologies for Snowflake-based data pipelines. Enhanced collaboration across teams, improving knowledge sharing and driving operational efficiencies in data engineering workflows. Improved ETL performance by 50% through optimized data transformations and parallel processing techniques using PySpark. Leveraged PySpark for advanced analytics applications, including machine learning model training and predictive analytics within Snowflake, enabling actionable insights. Strong understanding of Search Engine Marketing (SEM), Search Engine Optimization (SEO), product ads, and keyword analysis. Experienced in data preparation, modeling, and visualization using Power BI, along with expertise in developing interactive dashboards and reports using Tableau. Excellent communication and interpersonal skills with a strong ability to quickly adapt and learn new technologies.

Overview

years of professional experience

Work History

Data Engineer

Copart

08.2022 - Current

Worked on loading Invoice data from Suppliers into JDE Tables F0411 and F0911 in Oracle Database
Informatica PowerCenter Designer is used to fetch the csv flat files containing Invoice data from a secured SFTP server
Extracted the data using a Python script using Pandas to store data into a data frame
Was able to data parse Invoice files which have multiple headers and load into various snowflake tables
Loaded data from csv files into Snowflake tables using COPY INTO commands and used Informatica workflow manager tool for continuous data flow into the tables
Worked on Batch processing and monitoring error logs in Informatica workflow manager to store the details of every batch run
Maintained the different versions of integration using GitHub for audit and estimation of deadline purpose
Was able to load 5 out of 30 invoices per day to JDE initially and improved the performance to 90 out of 100 invoice per day

Jr Data Engineer

Copart

08.2022 - 03.2024

Involved in Agile Development process (Scrum and Sprint planning)
Migrated entire Memsql database to AWS cloud infrastructure and created analytical views to generate business driven reports
Successfully migrated multiple MariaDB and Memsql databases to AWS cloud infrastructure, reducing on-premises maintenance costs by 30% annually
Designed and implemented scalable data pipelines, handling up to 1 TB of data daily, resulting in a 40% improvement in data processing efficiency
Implemented AWS native services such as Amazon RDS and Amazon Redshift, enhancing database performance by 50% and reducing query response times by 60%
Developed automated ETL processes using AWS Glue, reducing data integration time by 50% and ensuring timely availability of critical business insights
Optimized database architectures and data models, achieving a 20% reduction in storage costs while maintaining data accessibility and integrity
Collaborated cross-functionally with teams to deliver data solutions aligned with business goals, resulting in a 25% increase in data-driven decision-making efficiency
Managed compliance requirements during database migrations, achieving 100% adherence to regulatory standards (e.g., GDPR, HIPAA)
Mentored Interns on AWS best practices and data engineering principles, improving team productivity by 30% and knowledge retention
Performed data analysis and Data Mining from various source systems using SQL
Extracted data from data lakes, EDW to relational databases for analyzing and getting more meaningful insights using SQL Queries and Apache PySpark
Developed Airflow DAGs in python by importing the Airflow libraries
Built ER diagrams to understand the flow of Copart’s operational data from various data sources and worked on optimizing the data flow and built ETL tool to schedule the new data flow to AWS cloud
Also worked on optimizing SQL queries to leverage Snowflake's query performance features such as query caching and query hints
Reduced query execution time by 30% through query profiling and optimization
Implemented auto-scaling and manual scaling strategies for Snowflake warehouses based on workload patterns
Achieved cost savings of 20% by right-sizing warehouses during off-peak hours
Partitioned large tables based on access patterns and query requirements to improve query performance
Reduced data scan times by 40% by partitioning tables on frequently queried columns
Created materialized views to pre-aggregate data and accelerate query performance for complex analytics queries
Improved dashboard loading times by 50% by leveraging materialized views for aggregated reporting
Compiled and optimized complex queries to reduce compilation overhead and enhance execution efficiency
Achieved a 25% reduction in query compilation time through query optimization techniques
Implemented efficient data compression techniques (e.g., automatic, manual) to minimize storage footprint and improve query performance
Reduced storage costs by 30% by optimizing data compression settings based on data characteristics
Managed query queuing and allocated resources dynamically to prioritize critical workloads and ensure SLA adherence
Improved overall system stability and performance by implementing workload management policies
Configured and optimized Snowflake's concurrency settings to manage simultaneous query execution and resource contention
Increased concurrency limits by 50% through fine-tuning and optimizing concurrency controls
Optimized data loading and unloading processes using Snowflake's bulk loading capabilities (e.g., Snowpipe, COPY command)
Reduced data loading times by 40% by leveraging parallel loading and efficient file formats (e.g., Parquet, ORC)
Designed and optimized Snowflake schemas (e.g., star schema, snowflake schema) to support efficient data querying and analytics
Improved query performance by 30% by restructuring schemas based on analytical workload patterns
Implemented automated monitoring and alerting for query performance and system metrics using Snowflake's built-in monitoring tools
Reduced incident response time by 50% by proactively monitoring and tuning Snowflake resources
Optimized ETL workflows and data pipelines using Snowflake's integration with ETL tools (e.g., Informatica, Talend)
Achieved a 20% increase in ETL throughput by fine-tuning pipeline configurations and leveraging Snowflake's parallel processing capabilities
Implemented best practices for data security (e.g., role-based access control, encryption) and compliance (e.g., GDPR, HIPAA) in Snowflake
Enhanced data governance and compliance posture, ensuring adherence to regulatory requirements
Optimized backup and disaster recovery strategies using Snowflake's automated backup and replication features
Reduced recovery time objectives (RTO) by 40% by implementing efficient backup and restores procedures
Documented optimization strategies, configurations, and best practices for Snowflake data pipelines
Improved team collaboration and knowledge sharing, enabling continuous improvement and efficiency gains in data operations

Data Analyst

Mu Sigma

11.2018 - 08.2022

Created data pipeline for migrating data of Top Keywords and Stock Keeping Units from Hadoop Hive database to AWS Cloud Infrastructure
Utilized AWS EMR clusters to schedule and execute data pipelines transferring data from Hadoop Hive to Amazon Redshift
Scheduled and optimized data pipelines using AWS EMR clusters, reducing data transfer times by 40% and enhancing scalability for peak workloads
Engineered analytical views and utilized Tableau to deliver a Top Searched Keyword report, facilitating strategic insights for Walmart Media Group and contributing to a 15% increase in advertising effectiveness
Depend understanding of Keyword Bidding and Digital Ads monetization, leveraging competitor data from generated reports and achieving a 20% reduction in advertising costs
Demonstrated expertise in migrating on-premises workloads to AWS cloud using AWS Glue, Amazon S3, Amazon Redshift, and AWS EMR
Ported existing on-premises Hive code to AWS cloud services for enhanced scalability and performance
Hands-on experience in migrating on-prem ETL processes to AWS cloud using native tools like AWS Glue, Amazon S3, Amazon Redshift, and AWS EMR
Led a data visualization project in Tableau that provided clients insights during new product launches, contributing to a 25% increase in turnover the following financial quarter
Implemented comprehensive monitoring and alerting systems using AWS CloudWatch, reducing downtime incidents by 30% and ensuring SLA compliance for critical data pipelines
Developed a comprehensive data pipeline for monitoring various KPIs during the holiday season (November – December) from Hadoop Hive to AWS cloud infrastructure
Also worked on extracting SKU data from mysql database processing 1M records daily with an extraction time averaging 10 minutes
Utilized PySpark to handle skewed data distributions, achieving an 80% reduction in data skewness and improving processing efficiency
Calculated daily SKU counts accurately for 10,000 products, achieving a 99.5% accuracy rate in data aggregation
Implemented ARIMA and Prophet models in Python for SKU demand forecasting, achieving 85% accuracy in predictions
Developed Apache Airflow DAGs for automating SKU count and forecasting jobs, reducing manual intervention by 90%
Implemented data lineage tracking to document data flow from source to output, ensuring auditability and compliance
Monitored pipeline performance metrics using Airflow monitoring tools, optimizing job execution time by 50%
Optimized data processing algorithms in PySpark, reducing job execution time from 2 hours to 30 minutes
Generated reports and visualizations using matplotlib and Power BI, improving decision-making with SKU count trends
Scaled pipeline to handle increased SKU data volumes, maintaining 95% uptime and readiness for future growth
Managed resource utilization effectively, achieving high efficiency in data processing and storage costs
Implemented CI/CD practices for seamless deployment of pipeline updates, enhancing scalability and reliability
Improved stakeholder engagement with insightful visualizations and accurate forecasting, supporting proactive inventory management
Increased proactive inventory management strategies by 25% through improved forecasting accuracy
Incorporated stakeholder feedback to enhance forecasting models, achieving a 15% improvement in accuracy over six months

Data Analyst

Mu Sigma

07.2015 - 10.2018

During my initial phase of professional career, I was exposed to Search Engine ads platform where I got a chance to learn how Search Engine Optimization and Search Engine Marketing works
Have a clear business understanding on Pay per click, Pay per view and how Search keyword algorithm works based on user keyword’s relevance and amount bid on Bidded Keywords by various advertisers
Engineered robust data pipelines using Azure Data Factory, reducing data processing time by 40% for analytical projects within Microsoft Bing Search Ads
Utilized Azure Databricks for real-time data processing, enhancing data freshness and enabling timely decision-making
Created datapipeline using data from Azure COSMOS db to build a Power BI tool that is used to monitor KPIs on a weekly basis
Developed interactive dashboards in Power BI, providing actionable insights to stakeholders and driving a 15% increase in decision-making efficiency
Created a Power BI tool integrated with Azure Cosmos DB to monitor weekly KPIs, automating data ingestion and visualization processes and improving operational efficiency by 30%
Conducted ad-hoc analysis on digital ad campaign launches across EMEA markets using Azure services, achieving 5% higher ROI than predicted through optimized PPC strategies
Implemented a Keyword Cloud feature leveraging Azure Cosmos DB, resulting in a 7% growth in ROI by suggesting relevant keywords to new advertisers
Deep-dive analysis on engagement drops in EMEA countries, optimizing search engine results and increasing user engagement metrics by 20% through Azure analytics tools
Collaborated cross-functionally with marketing and product teams to analyze competitor keywords and optimize SEM strategies, contributing to a 10% improvement in ad campaign performance metrics
Proactively stayed updated with Azure cloud technologies and industry best practices, incorporating new features and enhancements to drive continuous improvement in data engineering processes
Extracted over 1 million keyword records per day from Azure Cosmos DB using efficient querying methods, maintaining a data retrieval rate of 1000 records per second to generate a analytical report
Processed and cleaned extracted data using Python Pandas, reducing data anomalies by 15% and improving data quality for subsequent analysis
Analyzed keyword frequencies and trends, identifying top-performing keywords that contributed to a 10% increase in ad campaign click-through rates (CTR)
Developed interactive visualizations using Matplotlib and Plotly, presenting insights that led to a 20% improvement in stakeholder engagement with reports
Automated data pipeline execution using Azure Data Factory, reducing manual intervention by 80% and enabling scalable processing of data volumes up to 10 TB
Integrated Python analysis scripts with Power BI, reducing report generation time by 50% and increasing report accuracy
Monitored pipeline performance metrics with Azure Monitor, achieving 95% adherence to processing time SLAs and identifying optimization opportunities
Documented pipeline architecture and analysis methodologies, facilitating seamless collaboration across teams and reducing onboarding time for new analysts by 30%
Implemented iterative improvements based on feedback, resulting in a 15% reduction in data processing costs through efficiency gains and resource optimization

Education

Bachelor of Science - Computer Science And Engineering

Jawaharlal Nehru Technological University,Kakinada

Guntur

05.2001 -

Master of Science - Data Science

University of Memphis

Memphis, TN

05.2001 -

Skills

My SQL

undefined

Timeline

Data Engineer

Copart

08.2022 - Current

Jr Data Engineer

Copart

08.2022 - 03.2024

Data Analyst

Mu Sigma

11.2018 - 08.2022

Data Analyst

Mu Sigma

07.2015 - 10.2018

Bachelor of Science - Computer Science And Engineering

Jawaharlal Nehru Technological University,Kakinada

05.2001 -

Master of Science - Data Science

University of Memphis

05.2001 -

Monika Chattu

Summary

Overview

Work History

Data Engineer

Jr Data Engineer

Data Analyst

Data Analyst

Education

Bachelor of Science - Computer Science And Engineering

Master of Science - Data Science

Skills

Timeline

Data Engineer

Jr Data Engineer

Data Analyst

Data Analyst

Bachelor of Science - Computer Science And Engineering

Master of Science - Data Science

Similar Profiles

Harold ClevelandHarold Cleveland

DEREK TERRYDEREK TERRY

Todd NealTodd Neal

Sonya TorresSonya Torres

Masyika CrumMasyika Crum