Summary
Overview
Work History
Education
Skills
Timeline
Generic

Monika Chattu

DATA ENGINEER
Memphis,Tennessee

Summary

Nearly 7 years of professional experience in IT data analytics projects, specializing in designing and developing on-premises ETL solutions and modernizing them into AWS cloud-based architectures. Proficient in leveraging AWS native services such as AWS Glue for ETL workflows, Amazon Redshift for data warehousing, and Amazon S3 for scalable storage. Well-versed in orchestration tools like AWS Step Functions and AWS Data Pipeline for seamless workflow automation. Extensive experience in building and optimizing data pipelines using Pentaho to facilitate ETL processes, migrating data from MariaDB and flat files to AWS Cloud Services. Skilled in implementing serverless ETL solutions using AWS Lambda and processing large-scale datasets with AWS EMR (Elastic MapReduce). Hands-on experience with AWS Database Migration Service (DMS) and Schema Conversion Tool (SCT) for seamless data migration to AWS environments. Knowledgeable in AWS Batch for efficient batch processing of large datasets. Designed and built real-time data streaming solutions utilizing AWS Kinesis and developed data warehousing solutions on AWS using Amazon Redshift. Optimized Amazon S3 storage strategies for cost-effectiveness and improved data retrieval performance. Strong understanding of data modeling principles, including Fact and Dimension tables, as well as Snowflake and Star Schema modeling techniques. Extensive experience in data warehousing and data mart development using distributed SQL technologies such as Hive SQL, MySQL, and MS SQL. In-depth experience working with AWS services like AWS Lambda, AWS Glue, and AWS SDKs to develop and optimize data solutions. Expertise in writing SQL-based stored procedures, functions, and complex queries for database design and performance optimization. Comprehensive experience in the entire software development lifecycle (SDLC), from requirements analysis and design to development, testing, and deployment. Documented best practices, optimization strategies, and performance tuning methodologies for Snowflake-based data pipelines. Enhanced collaboration across teams, improving knowledge sharing and driving operational efficiencies in data engineering workflows. Improved ETL performance by 50% through optimized data transformations and parallel processing techniques using PySpark. Leveraged PySpark for advanced analytics applications, including machine learning model training and predictive analytics within Snowflake, enabling actionable insights. Strong understanding of Search Engine Marketing (SEM), Search Engine Optimization (SEO), product ads, and keyword analysis. Experienced in data preparation, modeling, and visualization using Power BI, along with expertise in developing interactive dashboards and reports using Tableau. Excellent communication and interpersonal skills with a strong ability to quickly adapt and learn new technologies.

Overview

10
10
years of professional experience

Work History

Data Engineer

Copart
08.2022 - Current
  • Worked on loading Invoice data from Suppliers into JDE Tables F0411 and F0911 in Oracle Database
  • Informatica PowerCenter Designer is used to fetch the csv flat files containing Invoice data from a secured SFTP server
  • Extracted the data using a Python script using Pandas to store data into a data frame
  • Was able to data parse Invoice files which have multiple headers and load into various snowflake tables
  • Loaded data from csv files into Snowflake tables using COPY INTO commands and used Informatica workflow manager tool for continuous data flow into the tables
  • Worked on Batch processing and monitoring error logs in Informatica workflow manager to store the details of every batch run
  • Maintained the different versions of integration using GitHub for audit and estimation of deadline purpose
  • Was able to load 5 out of 30 invoices per day to JDE initially and improved the performance to 90 out of 100 invoice per day

Jr Data Engineer

Copart
08.2022 - 03.2024
  • Involved in Agile Development process (Scrum and Sprint planning)
  • Migrated entire Memsql database to AWS cloud infrastructure and created analytical views to generate business driven reports
  • Successfully migrated multiple MariaDB and Memsql databases to AWS cloud infrastructure, reducing on-premises maintenance costs by 30% annually
  • Designed and implemented scalable data pipelines, handling up to 1 TB of data daily, resulting in a 40% improvement in data processing efficiency
  • Implemented AWS native services such as Amazon RDS and Amazon Redshift, enhancing database performance by 50% and reducing query response times by 60%
  • Developed automated ETL processes using AWS Glue, reducing data integration time by 50% and ensuring timely availability of critical business insights
  • Optimized database architectures and data models, achieving a 20% reduction in storage costs while maintaining data accessibility and integrity
  • Collaborated cross-functionally with teams to deliver data solutions aligned with business goals, resulting in a 25% increase in data-driven decision-making efficiency
  • Managed compliance requirements during database migrations, achieving 100% adherence to regulatory standards (e.g., GDPR, HIPAA)
  • Mentored Interns on AWS best practices and data engineering principles, improving team productivity by 30% and knowledge retention
  • Performed data analysis and Data Mining from various source systems using SQL
  • Extracted data from data lakes, EDW to relational databases for analyzing and getting more meaningful insights using SQL Queries and Apache PySpark
  • Developed Airflow DAGs in python by importing the Airflow libraries
  • Built ER diagrams to understand the flow of Copart’s operational data from various data sources and worked on optimizing the data flow and built ETL tool to schedule the new data flow to AWS cloud
  • Also worked on optimizing SQL queries to leverage Snowflake's query performance features such as query caching and query hints
  • Reduced query execution time by 30% through query profiling and optimization
  • Implemented auto-scaling and manual scaling strategies for Snowflake warehouses based on workload patterns
  • Achieved cost savings of 20% by right-sizing warehouses during off-peak hours
  • Partitioned large tables based on access patterns and query requirements to improve query performance
  • Reduced data scan times by 40% by partitioning tables on frequently queried columns
  • Created materialized views to pre-aggregate data and accelerate query performance for complex analytics queries
  • Improved dashboard loading times by 50% by leveraging materialized views for aggregated reporting
  • Compiled and optimized complex queries to reduce compilation overhead and enhance execution efficiency
  • Achieved a 25% reduction in query compilation time through query optimization techniques
  • Implemented efficient data compression techniques (e.g., automatic, manual) to minimize storage footprint and improve query performance
  • Reduced storage costs by 30% by optimizing data compression settings based on data characteristics
  • Managed query queuing and allocated resources dynamically to prioritize critical workloads and ensure SLA adherence
  • Improved overall system stability and performance by implementing workload management policies
  • Configured and optimized Snowflake's concurrency settings to manage simultaneous query execution and resource contention
  • Increased concurrency limits by 50% through fine-tuning and optimizing concurrency controls
  • Optimized data loading and unloading processes using Snowflake's bulk loading capabilities (e.g., Snowpipe, COPY command)
  • Reduced data loading times by 40% by leveraging parallel loading and efficient file formats (e.g., Parquet, ORC)
  • Designed and optimized Snowflake schemas (e.g., star schema, snowflake schema) to support efficient data querying and analytics
  • Improved query performance by 30% by restructuring schemas based on analytical workload patterns
  • Implemented automated monitoring and alerting for query performance and system metrics using Snowflake's built-in monitoring tools
  • Reduced incident response time by 50% by proactively monitoring and tuning Snowflake resources
  • Optimized ETL workflows and data pipelines using Snowflake's integration with ETL tools (e.g., Informatica, Talend)
  • Achieved a 20% increase in ETL throughput by fine-tuning pipeline configurations and leveraging Snowflake's parallel processing capabilities
  • Implemented best practices for data security (e.g., role-based access control, encryption) and compliance (e.g., GDPR, HIPAA) in Snowflake
  • Enhanced data governance and compliance posture, ensuring adherence to regulatory requirements
  • Optimized backup and disaster recovery strategies using Snowflake's automated backup and replication features
  • Reduced recovery time objectives (RTO) by 40% by implementing efficient backup and restores procedures
  • Documented optimization strategies, configurations, and best practices for Snowflake data pipelines
  • Improved team collaboration and knowledge sharing, enabling continuous improvement and efficiency gains in data operations

Data Analyst

Mu Sigma
11.2018 - 08.2022
  • Created data pipeline for migrating data of Top Keywords and Stock Keeping Units from Hadoop Hive database to AWS Cloud Infrastructure
  • Utilized AWS EMR clusters to schedule and execute data pipelines transferring data from Hadoop Hive to Amazon Redshift
  • Scheduled and optimized data pipelines using AWS EMR clusters, reducing data transfer times by 40% and enhancing scalability for peak workloads
  • Engineered analytical views and utilized Tableau to deliver a Top Searched Keyword report, facilitating strategic insights for Walmart Media Group and contributing to a 15% increase in advertising effectiveness
  • Depend understanding of Keyword Bidding and Digital Ads monetization, leveraging competitor data from generated reports and achieving a 20% reduction in advertising costs
  • Demonstrated expertise in migrating on-premises workloads to AWS cloud using AWS Glue, Amazon S3, Amazon Redshift, and AWS EMR
  • Ported existing on-premises Hive code to AWS cloud services for enhanced scalability and performance
  • Hands-on experience in migrating on-prem ETL processes to AWS cloud using native tools like AWS Glue, Amazon S3, Amazon Redshift, and AWS EMR
  • Led a data visualization project in Tableau that provided clients insights during new product launches, contributing to a 25% increase in turnover the following financial quarter
  • Implemented comprehensive monitoring and alerting systems using AWS CloudWatch, reducing downtime incidents by 30% and ensuring SLA compliance for critical data pipelines
  • Developed a comprehensive data pipeline for monitoring various KPIs during the holiday season (November – December) from Hadoop Hive to AWS cloud infrastructure
  • Also worked on extracting SKU data from mysql database processing 1M records daily with an extraction time averaging 10 minutes
  • Utilized PySpark to handle skewed data distributions, achieving an 80% reduction in data skewness and improving processing efficiency
  • Calculated daily SKU counts accurately for 10,000 products, achieving a 99.5% accuracy rate in data aggregation
  • Implemented ARIMA and Prophet models in Python for SKU demand forecasting, achieving 85% accuracy in predictions
  • Developed Apache Airflow DAGs for automating SKU count and forecasting jobs, reducing manual intervention by 90%
  • Implemented data lineage tracking to document data flow from source to output, ensuring auditability and compliance
  • Monitored pipeline performance metrics using Airflow monitoring tools, optimizing job execution time by 50%
  • Optimized data processing algorithms in PySpark, reducing job execution time from 2 hours to 30 minutes
  • Generated reports and visualizations using matplotlib and Power BI, improving decision-making with SKU count trends
  • Scaled pipeline to handle increased SKU data volumes, maintaining 95% uptime and readiness for future growth
  • Managed resource utilization effectively, achieving high efficiency in data processing and storage costs
  • Implemented CI/CD practices for seamless deployment of pipeline updates, enhancing scalability and reliability
  • Improved stakeholder engagement with insightful visualizations and accurate forecasting, supporting proactive inventory management
  • Increased proactive inventory management strategies by 25% through improved forecasting accuracy
  • Incorporated stakeholder feedback to enhance forecasting models, achieving a 15% improvement in accuracy over six months

Data Analyst

Mu Sigma
07.2015 - 10.2018
  • During my initial phase of professional career, I was exposed to Search Engine ads platform where I got a chance to learn how Search Engine Optimization and Search Engine Marketing works
  • Have a clear business understanding on Pay per click, Pay per view and how Search keyword algorithm works based on user keyword’s relevance and amount bid on Bidded Keywords by various advertisers
  • Engineered robust data pipelines using Azure Data Factory, reducing data processing time by 40% for analytical projects within Microsoft Bing Search Ads
  • Utilized Azure Databricks for real-time data processing, enhancing data freshness and enabling timely decision-making
  • Created datapipeline using data from Azure COSMOS db to build a Power BI tool that is used to monitor KPIs on a weekly basis
  • Developed interactive dashboards in Power BI, providing actionable insights to stakeholders and driving a 15% increase in decision-making efficiency
  • Created a Power BI tool integrated with Azure Cosmos DB to monitor weekly KPIs, automating data ingestion and visualization processes and improving operational efficiency by 30%
  • Conducted ad-hoc analysis on digital ad campaign launches across EMEA markets using Azure services, achieving 5% higher ROI than predicted through optimized PPC strategies
  • Implemented a Keyword Cloud feature leveraging Azure Cosmos DB, resulting in a 7% growth in ROI by suggesting relevant keywords to new advertisers
  • Deep-dive analysis on engagement drops in EMEA countries, optimizing search engine results and increasing user engagement metrics by 20% through Azure analytics tools
  • Collaborated cross-functionally with marketing and product teams to analyze competitor keywords and optimize SEM strategies, contributing to a 10% improvement in ad campaign performance metrics
  • Proactively stayed updated with Azure cloud technologies and industry best practices, incorporating new features and enhancements to drive continuous improvement in data engineering processes
  • Extracted over 1 million keyword records per day from Azure Cosmos DB using efficient querying methods, maintaining a data retrieval rate of 1000 records per second to generate a analytical report
  • Processed and cleaned extracted data using Python Pandas, reducing data anomalies by 15% and improving data quality for subsequent analysis
  • Analyzed keyword frequencies and trends, identifying top-performing keywords that contributed to a 10% increase in ad campaign click-through rates (CTR)
  • Developed interactive visualizations using Matplotlib and Plotly, presenting insights that led to a 20% improvement in stakeholder engagement with reports
  • Automated data pipeline execution using Azure Data Factory, reducing manual intervention by 80% and enabling scalable processing of data volumes up to 10 TB
  • Integrated Python analysis scripts with Power BI, reducing report generation time by 50% and increasing report accuracy
  • Monitored pipeline performance metrics with Azure Monitor, achieving 95% adherence to processing time SLAs and identifying optimization opportunities
  • Documented pipeline architecture and analysis methodologies, facilitating seamless collaboration across teams and reducing onboarding time for new analysts by 30%
  • Implemented iterative improvements based on feedback, resulting in a 15% reduction in data processing costs through efficiency gains and resource optimization

Education

Bachelor of Science - Computer Science And Engineering

Jawaharlal Nehru Technological University,Kakinada
Guntur
05.2001 -

Master of Science - Data Science

University of Memphis
Memphis, TN
05.2001 -

Skills

  • My SQL
undefined

Timeline

Data Engineer

Copart
08.2022 - Current

Jr Data Engineer

Copart
08.2022 - 03.2024

Data Analyst

Mu Sigma
11.2018 - 08.2022

Data Analyst

Mu Sigma
07.2015 - 10.2018

Bachelor of Science - Computer Science And Engineering

Jawaharlal Nehru Technological University,Kakinada
05.2001 -

Master of Science - Data Science

University of Memphis
05.2001 -
Monika ChattuDATA ENGINEER