Summary

Overview

Work History

Education

Skills

Certification

Timeline

Lahari Mattapally

Cincinnati,OH

Summary

Experienced Data Engineer with over 5 years of experience designing and implementing scalable ETL pipelines using Azure Data Factory, AWS Glue, and Apache Spark. Skilled in optimizing data workflows across various cloud platforms, including Azure Databricks, Azure Synapse Analytics, and AWS Redshift, to enable high-performance data warehousing and analytics. Proficient in big data technologies and cloud-native solutions such as Azure Data Lake Storage, AWS S3, and Delta Lake. Well-versed in data processing using Apache Kafka, Azure Event Hubs, Power Apps, Power Automate, and AWS Kinesis, ensuring efficient handling of large-scale datasets and streaming analytics. Adept at building and maintaining data catalogs, optimizing query performance, and automating CI/CD pipelines for data infrastructure using Azure DevOps and AWS CloudFormation.

Overview

years of professional experience

Work History

Data Engineer

Fifth Third Bank

11.2024 - Current

Architected schemas and innovative partitioning strategies in Snowflake, applying RBAC to manage secure access to Oracle Database
Recorded workflows in Confluence and tracked tasks in Jira, contributing to Aegis Real-Time Decision Engine and EWS ID Check projects with 2 analysts
Managed and Engineered ETL pipelines in IBM DataStage, managing the flow of 40 Gigabytes of data daily
Enabled efficient data ingestion, transformation, and loading into Snowflake to strengthen fraud detection capabilities
Formulated transformation workflows with DBT, leveraging incremental and snapshot models alongside Jinja macros for modular data preparation
Enhanced data quality for fraud analytics, supporting a team of analysts and scientists
Created Power BI dashboards, synthesizing data from Snowflake, T-SQL, and SQL Server to present fraud detection metrics across 100+ datasets
Automated SQL-based refresh schedules, reducing manual efforts by 40%
Improved query performance through materialized views in Snowflake, synchronized updates with Streams and Tasks, and achieving a 25% increase in data accuracy through validation protocols using DBT
Coordinated with 2 fraud analysts and 2 data scientists to strategize workflows for EWS ID Check, integrating real-time analytics pipelines using Snowflake and DBT transformations
Visualized key operational metrics in Power BI, utilizing few advanced DAX expressions for fraud detection dashboards tailored to organizational needs.

Data Engineer

DELL

08.2023 - 10.2024

Planned and deployed ETL pipelines in Azure Data Factory and extracting data from SQL Server, transforming with PySpark in Azure
Databricks, and loading into Synapse Analytics, processing nearly 30,000 records weekly while maintaining script versioning using Git
Developed transformation workflows in Databricks, utilizing PySpark and SQL to pre-process IoT and CRM datasets from operations and sales departments over a three-month period
Integrated real-time data ingestion pipelines using Kafka and optimized Cassandra performance with partitioning and indexing, enabling low-latency queries for time sensitive analytics and executed shell scripting on Linux/Unix environments to automate data ingestion workflows
Built distributed data pipelines using Apache Spark and Scala, ensuring scalability for large datasets, and Java APIs to connect with
Cassandra, facilitating transactional storage and analytics workflows for daily operations
Scheduled and orchestrated workflows with Apache Airflow, linking Databricks notebooks through Python-based DAGs for periodic execution every 24 hours
Shaped and implemented event-driven API workflows in Azure Functions for efficient data preprocessing and loading
Customized interactive Tableau dashboards, visualizing IoT and CRM metrics sourced from Azure SQL Database, and scheduled monthly data refreshes using SQL and Tableau APIs
Migrated datasets from SQL Server to Snowflake using Python, creating ETL pipelines for structured datasets and streamlining weekly batch processing
Configured Hive tables with partitioning and schemas, supporting Parquet and Avro formats for analytics processing
Set up alerting systems with Kubernetes and Docker, enabling weekly monitoring and issue resolution for ETL pipelines
Generated
Kibana dashboards linked to Elasticsearch, providing real-time insights into daily pipeline performance.

Data Engineer

ACCENTURE

07.2021 - 07.2022

Streamlined data transformations in Azure Databricks using Python, PySpark, and SQL, preparing healthcare datasets in JSON, CSV, and
Parquet formats for daily processing of over 10,000 records
Drafted reusable scripts in Databricks notebooks, enabling efficient batch and real-time processing while leveraging PySpark and Scala for joins, aggregations, and filtering on datasets containing thousands of daily patient records, supporting weekly processing goals and enhancing team script reusability
Visualized healthcare metrics by creating Power BI dashboards with advanced DAX functions, enabling drill-through features to explore patient records and claims data for over 5 stakeholders while ensuring HIPAA compliance
Launched ETL pipelines in Azure Data Factory, extracting healthcare data from SQL Server and other sources, applying business rules in
Databricks, and loading results into Azure Synapse Analytics for seamless integration
Introduced Kafka for real-time ingestion and directed prepared data into Azure Synapse Analytics to support accurate healthcare reporting
Collaborated with three engineers to deploy workflows within a two-month project timeline
Enhanced data operations using Python scripts, linked to Azure SQL Database, Azure Fabric, and Data Factory pipelines to handle large scale healthcare data processing, achieving weekly ingestion targets
Incorporated MongoDB to manage unstructured healthcare data, optimizing query performance for transactional analytics and handling thousands of records daily
Upgraded ETL reliability by utilizing Apache Airflow for workflow scheduling and embedding custom error-handling mechanisms, while monitoring pipelines via Azure Monitor, achieving a 15% reduction in pipeline failures and improving data ingestion quality over a quarter.

Data Engineer

SONATA SOFTWARE

08.2019 - 06.2021

Constructed ETL pipelines using AWS Glue to extract datasets from RDS, transform with DBT, and load into Amazon Redshift, processing over 2 terabytes of data weekly
Wrote advanced SQL queries to support data transformations and ensured seamless integration with downstream systems
Devised real-time data pipelines with Apache Kafka and AWS Lambda, integrating transactional data into Redshift for analytics
Deployed containerized applications via ECR on AWS EC2, collaborating with a team of 3 engineers to deliver scalable solutions
Established machine learning workflows in AWS SageMaker, utilizing Redshift data for predictive modeling
Refined model accuracy by over a 2-month period through advanced feature engineering with Python
Coordinated distributed data transformations using PySpark on Hadoop clusters, handling datasets from MongoDB and Cassandra
Optimized queries by configuring partitioning and indexing, reducing query times by 20%
Designed automated workflows in AWS Step Functions, integrating Glue, Lambda, and Redshift processes
Improved task orchestration across 12-week project timelines, reducing manual intervention and ensuring operational efficiency
Configured dashboards in AWS QuickSight, visualizing monthly transactional trends from Redshift and RDS
Wrote Python scripts to parse, clean, and standardize raw datasets
Monitored and maintained pipelines using CloudWatch, addressing system bottlenecks and reducing downtime by 10 hours per month
Configured IAM roles to secure access and improve workflow compliance
PROJECTS
YouTube Data Analytics: Initiated ETL processes with SQL and Python to load data into AWS S3 and Redshift
Partitioned tables, cataloged data with AWS Glue, and handled datasets using AWS Spark
Set up dashboards in AWS QuickSight to deliver actionable insights
AWS Application Development (Image/Video Detection): Developed a serverless application with AWS Fargate and Lambda for image/video processing
Added Rekognition for face recognition and text detection
Used S3 for storage and implemented scaling for efficient content moderation
Sports Data Analytics Pipeline: Constructed a data pipeline using Azure Data Factory and processed raw formats like CSV and JSON with PySpark in Azure Databricks
Stored the transformed data in Azure Synapse Analytics and created power BI dashboards visualize player and team metrics.

Education

Master of Science - Information Systems, Data Analytics & Project Management

Central Michigan University

Bachelor of Technology - Electrical and Electronics Engineering

JNTU Hyderabad

Skills

TECHNICAL SKILLS
Data Engineering skills: ETL Development, Data Integration, Data Transformation, Real-time Data Processing, Data Pipeline Automation,Data Migration
Data Storage & Processing:Hadoop, Apache Spark, Apache Kafka, Big Query, Snowflake, Redshift
Programming & Scripting: Python (Pandas, NumPy, PySpark ), SQL, Java, Scala, Bash
ETL & Data Modeling: Informatica, Talend, Airflow, DBT, DataStage, AWS Glue, Azure Data factory, Dimensional Modeling

Statistical & Predictive Modelling: Regression Analysis, Time-series Forecasting, ARIMA Models, Clustering, Predictive Maintenance,Decision Trees
Databases: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server, SQLite, MongoDB, Cassandra, Redis, Firebase, Couchbase,Amazon DynamoDB
Cloud Platforms & Tools,Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Azure Data Lake, Azure IoT Hub, AWS,(Amazon Web Services),AWS Glue, AWS EMR,AWS SageMaker, AWS Athena
DevOps & CI/CD Tools: Jenkins, Bitbucket, Git, GitHub, Docker, Terraform, Kubernetes
Data Analytics & Visualization: Power BI, Tableau, PLEXOS, Advanced DAX Functions, Paginated Reports, Real-time Dashboard Creation

Certification

• Certified Kubernetes Administrator

• Azure Data Engineer Associate

• AWS Developer Associate

Timeline

Data Engineer

Fifth Third Bank

11.2024 - Current

Data Engineer

DELL

08.2023 - 10.2024

Data Engineer

ACCENTURE

07.2021 - 07.2022

Data Engineer

SONATA SOFTWARE

08.2019 - 06.2021

Bachelor of Technology - Electrical and Electronics Engineering

JNTU Hyderabad

Master of Science - Information Systems, Data Analytics & Project Management

Central Michigan University

Lahari Mattapally

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Education

Master of Science - Information Systems, Data Analytics & Project Management

Bachelor of Technology - Electrical and Electronics Engineering

Skills

Certification

Timeline

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Bachelor of Technology - Electrical and Electronics Engineering

Master of Science - Information Systems, Data Analytics & Project Management

Similar Profiles

Maryam Al MansourMaryam Al Mansour

Sasidhar ReddySasidhar Reddy

Navjot RandhawaNavjot Randhawa

Balram Chowdary KondraguntaBalram Chowdary Kondragunta