Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Lahari Mattapally

Cincinnati,OH

Summary

Experienced Data Engineer with over 5 years of experience designing and implementing scalable ETL pipelines using Azure Data Factory, AWS Glue, and Apache Spark. Skilled in optimizing data workflows across various cloud platforms, including Azure Databricks, Azure Synapse Analytics, and AWS Redshift, to enable high-performance data warehousing and analytics. Proficient in big data technologies and cloud-native solutions such as Azure Data Lake Storage, AWS S3, and Delta Lake. Well-versed in data processing using Apache Kafka, Azure Event Hubs, Power Apps, Power Automate, and AWS Kinesis, ensuring efficient handling of large-scale datasets and streaming analytics. Adept at building and maintaining data catalogs, optimizing query performance, and automating CI/CD pipelines for data infrastructure using Azure DevOps and AWS CloudFormation.

Overview

5
5
years of professional experience

Work History

Data Engineer

Fifth Third Bank
11.2024 - Current
  • Architected schemas and innovative partitioning strategies in Snowflake, applying RBAC to manage secure access to Oracle Database
  • Recorded workflows in Confluence and tracked tasks in Jira, contributing to Aegis Real-Time Decision Engine and EWS ID Check projects with 2 analysts
  • Managed and Engineered ETL pipelines in IBM DataStage, managing the flow of 40 Gigabytes of data daily
  • Enabled efficient data ingestion, transformation, and loading into Snowflake to strengthen fraud detection capabilities
  • Formulated transformation workflows with DBT, leveraging incremental and snapshot models alongside Jinja macros for modular data preparation
  • Enhanced data quality for fraud analytics, supporting a team of analysts and scientists
  • Created Power BI dashboards, synthesizing data from Snowflake, T-SQL, and SQL Server to present fraud detection metrics across 100+ datasets
  • Automated SQL-based refresh schedules, reducing manual efforts by 40%
  • Improved query performance through materialized views in Snowflake, synchronized updates with Streams and Tasks, and achieving a 25% increase in data accuracy through validation protocols using DBT
  • Coordinated with 2 fraud analysts and 2 data scientists to strategize workflows for EWS ID Check, integrating real-time analytics pipelines using Snowflake and DBT transformations
  • Visualized key operational metrics in Power BI, utilizing few advanced DAX expressions for fraud detection dashboards tailored to organizational needs.

Data Engineer

DELL
08.2023 - 10.2024
  • Planned and deployed ETL pipelines in Azure Data Factory and extracting data from SQL Server, transforming with PySpark in Azure
  • Databricks, and loading into Synapse Analytics, processing nearly 30,000 records weekly while maintaining script versioning using Git
  • Developed transformation workflows in Databricks, utilizing PySpark and SQL to pre-process IoT and CRM datasets from operations and sales departments over a three-month period
  • Integrated real-time data ingestion pipelines using Kafka and optimized Cassandra performance with partitioning and indexing, enabling low-latency queries for time sensitive analytics and executed shell scripting on Linux/Unix environments to automate data ingestion workflows
  • Built distributed data pipelines using Apache Spark and Scala, ensuring scalability for large datasets, and Java APIs to connect with
  • Cassandra, facilitating transactional storage and analytics workflows for daily operations
  • Scheduled and orchestrated workflows with Apache Airflow, linking Databricks notebooks through Python-based DAGs for periodic execution every 24 hours
  • Shaped and implemented event-driven API workflows in Azure Functions for efficient data preprocessing and loading
  • Customized interactive Tableau dashboards, visualizing IoT and CRM metrics sourced from Azure SQL Database, and scheduled monthly data refreshes using SQL and Tableau APIs
  • Migrated datasets from SQL Server to Snowflake using Python, creating ETL pipelines for structured datasets and streamlining weekly batch processing
  • Configured Hive tables with partitioning and schemas, supporting Parquet and Avro formats for analytics processing
  • Set up alerting systems with Kubernetes and Docker, enabling weekly monitoring and issue resolution for ETL pipelines
  • Generated
  • Kibana dashboards linked to Elasticsearch, providing real-time insights into daily pipeline performance.

Data Engineer

ACCENTURE
07.2021 - 07.2022
  • Streamlined data transformations in Azure Databricks using Python, PySpark, and SQL, preparing healthcare datasets in JSON, CSV, and
  • Parquet formats for daily processing of over 10,000 records
  • Drafted reusable scripts in Databricks notebooks, enabling efficient batch and real-time processing while leveraging PySpark and Scala for joins, aggregations, and filtering on datasets containing thousands of daily patient records, supporting weekly processing goals and enhancing team script reusability
  • Visualized healthcare metrics by creating Power BI dashboards with advanced DAX functions, enabling drill-through features to explore patient records and claims data for over 5 stakeholders while ensuring HIPAA compliance
  • Launched ETL pipelines in Azure Data Factory, extracting healthcare data from SQL Server and other sources, applying business rules in
  • Databricks, and loading results into Azure Synapse Analytics for seamless integration
  • Introduced Kafka for real-time ingestion and directed prepared data into Azure Synapse Analytics to support accurate healthcare reporting
  • Collaborated with three engineers to deploy workflows within a two-month project timeline
  • Enhanced data operations using Python scripts, linked to Azure SQL Database, Azure Fabric, and Data Factory pipelines to handle large scale healthcare data processing, achieving weekly ingestion targets
  • Incorporated MongoDB to manage unstructured healthcare data, optimizing query performance for transactional analytics and handling thousands of records daily
  • Upgraded ETL reliability by utilizing Apache Airflow for workflow scheduling and embedding custom error-handling mechanisms, while monitoring pipelines via Azure Monitor, achieving a 15% reduction in pipeline failures and improving data ingestion quality over a quarter.

Data Engineer

SONATA SOFTWARE
08.2019 - 06.2021
  • Constructed ETL pipelines using AWS Glue to extract datasets from RDS, transform with DBT, and load into Amazon Redshift, processing over 2 terabytes of data weekly
  • Wrote advanced SQL queries to support data transformations and ensured seamless integration with downstream systems
  • Devised real-time data pipelines with Apache Kafka and AWS Lambda, integrating transactional data into Redshift for analytics
  • Deployed containerized applications via ECR on AWS EC2, collaborating with a team of 3 engineers to deliver scalable solutions
  • Established machine learning workflows in AWS SageMaker, utilizing Redshift data for predictive modeling
  • Refined model accuracy by over a 2-month period through advanced feature engineering with Python
  • Coordinated distributed data transformations using PySpark on Hadoop clusters, handling datasets from MongoDB and Cassandra
  • Optimized queries by configuring partitioning and indexing, reducing query times by 20%
  • Designed automated workflows in AWS Step Functions, integrating Glue, Lambda, and Redshift processes
  • Improved task orchestration across 12-week project timelines, reducing manual intervention and ensuring operational efficiency
  • Configured dashboards in AWS QuickSight, visualizing monthly transactional trends from Redshift and RDS
  • Wrote Python scripts to parse, clean, and standardize raw datasets
  • Monitored and maintained pipelines using CloudWatch, addressing system bottlenecks and reducing downtime by 10 hours per month
  • Configured IAM roles to secure access and improve workflow compliance
  • PROJECTS
  • YouTube Data Analytics: Initiated ETL processes with SQL and Python to load data into AWS S3 and Redshift
  • Partitioned tables, cataloged data with AWS Glue, and handled datasets using AWS Spark
  • Set up dashboards in AWS QuickSight to deliver actionable insights
  • AWS Application Development (Image/Video Detection): Developed a serverless application with AWS Fargate and Lambda for image/video processing
  • Added Rekognition for face recognition and text detection
  • Used S3 for storage and implemented scaling for efficient content moderation
  • Sports Data Analytics Pipeline: Constructed a data pipeline using Azure Data Factory and processed raw formats like CSV and JSON with PySpark in Azure Databricks
  • Stored the transformed data in Azure Synapse Analytics and created power BI dashboards visualize player and team metrics.

Education

Master of Science - Information Systems, Data Analytics & Project Management

Central Michigan University

Bachelor of Technology - Electrical and Electronics Engineering

JNTU Hyderabad

Skills

  • TECHNICAL SKILLS
  • Data Engineering skills: ETL Development, Data Integration, Data Transformation, Real-time Data Processing, Data Pipeline Automation,Data Migration
  • Data Storage & Processing:Hadoop, Apache Spark, Apache Kafka, Big Query, Snowflake, Redshift
  • Programming & Scripting: Python (Pandas, NumPy, PySpark ), SQL, Java, Scala, Bash
  • ETL & Data Modeling: Informatica, Talend, Airflow, DBT, DataStage, AWS Glue, Azure Data factory, Dimensional Modeling
  • Statistical & Predictive Modelling: Regression Analysis, Time-series Forecasting, ARIMA Models, Clustering, Predictive Maintenance,Decision Trees
  • Databases: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server, SQLite, MongoDB, Cassandra, Redis, Firebase, Couchbase,Amazon DynamoDB
  • Cloud Platforms & Tools,Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Azure Data Lake, Azure IoT Hub, AWS,(Amazon Web Services),AWS Glue, AWS EMR,AWS SageMaker, AWS Athena
  • DevOps & CI/CD Tools: Jenkins, Bitbucket, Git, GitHub, Docker, Terraform, Kubernetes
  • Data Analytics & Visualization: Power BI, Tableau, PLEXOS, Advanced DAX Functions, Paginated Reports, Real-time Dashboard Creation

Certification

• Certified Kubernetes Administrator

• Azure Data Engineer Associate

• AWS Developer Associate

Timeline

Data Engineer

Fifth Third Bank
11.2024 - Current

Data Engineer

DELL
08.2023 - 10.2024

Data Engineer

ACCENTURE
07.2021 - 07.2022

Data Engineer

SONATA SOFTWARE
08.2019 - 06.2021

Bachelor of Technology - Electrical and Electronics Engineering

JNTU Hyderabad

Master of Science - Information Systems, Data Analytics & Project Management

Central Michigan University
Lahari Mattapally