Summary

Overview

Work History

Education

Skills

Timeline

PAVAN KUMAR REDDY ANNACHEDU

Summary

Senior Data Engineer with around 10+ years of expertise in both on-premises and cloud data solutions, including AWS, Azure, Snowflake, Python, SQL, and NoSQL. Strong experience in Linux/Unix environments and proficient in writing Shell Scripts. Extensive experience in building and optimizing data pipelines, employing ETL processes and automation using MSBI tools such as SSIS, SSRS, and SSAS, ensuring seamless data flow across the organization. Agile Scrum practitioner with a proven track record of successfully managing complex data engineering projects, delivering high-quality solutions within tight timelines. Proficient in Cloud technologies such as AWS EMR, Redshift, Lambda, Step Functions, and Cloud Watch. Integrated diverse data sources into a unified data platform, leveraging Azure services such as Azure Synapse Analytics and Azure Data Lake Storage. Proficient in designing and implementing data storage solutions using Azure Data Lake Storage and Azure Blob Storage for efficient data organization and accessibility Specialized in Natural Language Processing (NLP), utilizing advanced algorithms and Python-based frameworks to extract valuable insights from unstructured data, enhancing data-driven decision-making. Experienced Data Analyst specializing in statistical analysis with SPSS and data visualization using Excel. Adept at using Google Analytics for tracking user behavior and SAS for advanced analytics and reporting. Detail-Oriented Data Analyst with expertise in developing process maps using Lucidchart and Microsoft Visio. Skilled in data storage optimization with Oracle Database and statistical data analysis using SPSS and SAS. Proficient Data Analyst experienced in creating and maintaining dashboards with Excel, tracking key metrics with Google Analytics, and utilizing Confluence for project documentation. Proficient in JIRA for efficient project management, ensuring clear communication and collaboration among cross-functional teams during the development lifecycle. Well-versed in DevOps practices, implementing CI/CD pipelines and infrastructure as code to streamline development processes, reducing time-to-deployment and enhancing overall system reliability. Power BI and Qlik Sense expert with a keen eye for data visualization, transforming raw data into actionable insights for stakeholders and business leaders. Versatile programming skills in Python, PySpark, Java, R, SQL, T-SQL, NOSQL, and MATLAB, enabling the development of scalable and efficient data processing solutions across diverse technology stacks.

Overview

years of professional experience

Work History

Senior Data Engineer

PNC Bank

04.2023 - Current

Developed and maintained Python scripts for Extract, Transform, Load (ETL) processes, ensuring efficient data movement and transformation across diverse data sources
Utilized Python libraries for data manipulation and processing, implementing data cleansing, transformation, and aggregation tasks to prepare data for analysis
Designed and implemented database schemas using T-SQL, ensuring efficient data organization and adherence to normalization principles
Designed, implemented, and maintained robust data pipelines on Azure using services like Azure Data Factory, ensuring efficient and scalable data processing
Developed and managed data warehousing solutions on Azure SQL Data Warehouse (now Azure Synapse Analytics), ensuring optimal performance and accessibility
Administered Azure-based databases, including Azure SQL Database and Cosmos DB, ensuring data consistency, availability, and security
Optimized SQL queries for performance, utilizing indexing, query hints, and execution plan analysis to enhance database query response times
Leveraged Spark SQL for complex data analysis tasks, writing optimized SQL queries to manipulate and aggregate data stored in Hive and Parquet formats
Managed and administered Hadoop clusters using Cloudera Manager and Ambari, ensuring high availability and scalability of HDFS and YARN services
Administered MS SQL Server databases, including performance monitoring, backup and recovery, and schema management using SQL Server Management Studio (SSMS)
Implemented data processing jobs using Scala within Apache Spark, leveraging distributed computing capabilities for large-scale data transformations
Developed and maintained applications for big data processing, working with frameworks such as Apache Hadoop and Apache Flink
Designed and implemented Snowflake data warehouses, considering best practices for schema design, clustering, and partitioning to ensure optimal performance and scalability
Implemented batch job scheduling and automation using Control-M, orchestrating ETL workflows across multiple environments to ensure timely and accurate data processing
Designed and developed interactive and visually compelling reports and dashboards in Power BI, incorporating KPIs and data visualizations to provide actionable insights for business stakeholders
Implemented and maintained version control for data engineering codebase using Git, ensuring a systematic and organized approach to tracking changes, facilitating collaboration, and enabling easy rollbacks
Actively participated in Scrum ceremonies, including sprint planning and backlog refinement, by collaborating with cross-functional teams to prioritize and estimate data engineering tasks, ensuring alignment with project goals
Collaborated effectively with cross-functional teams, including data scientists, analysts, software engineers, and business stakeholders, fostering a collaborative environment to achieve project goals
Environment: SSRS, Informatica PowerCenter, Python, T-SQL, Spark SQL, Cloudera Manager and Ambari

Sr. Data Engineer

Mayo Clinic

02.2021 - 03.2023

Implemented ETL processes using AWS Glue for data integration from multiple sources into Redshift, automating data workflows and improving data accuracy and timeliness
Implemented complex ETL processes using Informatica PowerCenter to extract, transform, and load data from various source systems into a centralized data warehouse
Designed and developed data integration jobs using Talend to connect disparate data sources and streamline data flows for analytics and reporting purposes
Implemented data validation and cleansing routines in Python, ensuring data quality and integrity throughout the ETL pipeline
Designed and implemented relational database schemas in SQL Server and MySQL, ensuring efficient data storage and retrieval for analytical applications
Developed and maintained ETL pipelines using SQL scripts and stored procedures to extract, transform, and load data from diverse sources into data warehouse systems, ensuring data consistency and integrity
Automated data engineering workflows using Apache Spark, implementing scheduling and orchestration to ensure timely execution of data pipelines and tasks
Optimized Spark code for performance, adhering to best practices and coding standards to enhance readability, maintainability, and scalability of data engineering solutions
Designed and implemented efficient data pipelines using PySpark, ensuring seamless data extraction, transformation, and loading
Developed custom MapReduce jobs in Java to process and analyze large datasets stored in Hadoop Distributed File System (HDFS), optimizing data processing workflows
Developed and maintained T-SQL stored procedures for data manipulation, ensuring consistent and secure access to database resources
Adhered to and promoted best practices for data warehousing within Snowflake, including data governance, version control, and documentation, to ensure the reliability and maintainability of data solutions
Orchestrated ETL processes and data integration tasks using Control M, ensuring timely execution of complex data workflows
Created interactive Power BI dashboards and reports for data visualization and business intelligence
Managed version control using Git and collaborated with team members for code reviews and branching strategies
Used JIRA for bug tracking and issue tracking and added several options to the application to choose algorithm for data and address generation
Communicated effectively with business stakeholders to gather data requirements, understand project goals, and provide updates on data engineering progress, ensuring alignment with organizational objectives
Environment: Python, Apache Spark, PySpark, MapReduce, Hadoop, T-SQL, Snowflake, Data warehouse, ETL, JIRA

Data Engineer

T-Mobile

04.2019 - 01.2021

Integrated Hadoop for distributed data processing, ensuring efficient handling of large-scale datasets and enhancing data processing capabilities
Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python
Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata
Enhanced ad-hoc report creation and data processing for business reports by reviewing and modifying SAS Programs and Spark jobs, utilizing Spark's distributed computing framework for scalability and performance improvements
Involved in information-gathering meetings and JAD sessions to gather business requirements, deliver business requirements document and preliminary logical data model
Made Power BI reports more interact and activate by using storytelling features such as bookmarks, selection panes, drill through filters also created custom visualizations using 'R-Programming Language'
Developed and maintained Excel dashboards to monitor KPIs, streamline reporting processes, and support data-driven decision-making
Integrated Azure Active Directory (AAD) with data analytics solutions for user authentication and authorization, ensuring secure access to data assets
Implemented Azure Monitor for proactive monitoring and alerting of Azure resources, ensuring high availability and performance of data analytics platforms
Used ETL methodology for supporting data extraction, transformations and loading processing, in a complex EDW using Informatica
Environment: SQL, Spark, Hadoop, SAS, DAX, GoogleAnalytics, Confluence

Data Engineer

Fidelity Investments

02.2017 - 03.2019

Worked on Advanced SQL skills, fluent in Python, advanced Microsoft Office skills, particularly Excel and analytical platforms
Involved in extensive DATA validation by writing several complex SQL queries and involved in back-end testing and worked with data quality issues
Created remote SAS sessions to run the jobs in parallel mode to cut off the extraction time as the datasets were generated simultaneously
Analyzed large datasets using Excel, leveraging advanced functions, pivot tables, and data visualization tools to present findings to stakeholders
Created Entity-Relationship (ER) diagrams to model database structures, aiding in the design and development of robust data management solutions
Developed process flow diagrams and organizational charts using Microsoft Visio to streamline business operations and improve communication across teams
Documented project requirements and progress using Confluence to ensure clear communication and project tracking across all stakeholders
Environment: Excel, ER diagrams, Microsoft Visio, Confluence

Data Engineer

CMC Limited

07.2013 - 09.2016

Works with key stakeholders to understand the data requirements and translates strategic requirements into usable enterprise information architecture
Performed requirements assessments and designed suitable data flows or data batches
Utilized statistical methods and machine learning algorithms to derive actionable insights, unveiling trends, correlations and patterns within intricate datasets
Executed SQL queries to navigate databases, refine complex queries and conduct data manipulations, assuring prompt data access
Established data validation protocols to maintain data integrity and precision
Leveraged Python and R for sophisticated data preprocessing tasks, including cleaning, normalization and transformation, readying datasets for comprehensive analysis and modeling
Designed and managed A/B testing frameworks to assess the impact of new features or modifications on user behavior and key business metrics, underpinning data-informed strategic decisions
Conducted workshops and training sessions to spread data literacy and analytical methodologies throughout the organization, nurturing a foundation of data-centric decision-making
Environment: SQL, Excel, ER diagrams, Microsoft Visio, Confluence

Education

CMC Limited

09.2016

Skills

Python
PySpark
Java
Scala
R
SQL
T-SQL
Shell Scripting
Apache Hadoop Ecosystem
Cloudera Manager
Ambari
AWS
Azure
Snowflake
Informatica PowerCenter
Talend
SSIS
Control-M
AWS Glue
Azure SQL Data Warehouse
Azure SQL Database
Oracle Database
MS SQL Server
Cosmos DB
Spark SQL
Apache Kafka
MSBI

Power BI
Qlik
SPSS
SAS
Excel
Google Analytics
Microsoft Visio
Lucidchart
Confluence
Git
Jenkins
AWS CodePipeline
AWS CodeDeploy
Agile Scrum
JIRA
Data Cleansing
Transformation
Aggregation
Process Flow Diagrams
Network Diagrams
System Architecture Layouts
Process Maps
Workflow Diagrams
ER Diagrams

Timeline

Senior Data Engineer

PNC Bank

04.2023 - Current

Sr. Data Engineer

Mayo Clinic

02.2021 - 03.2023

Data Engineer

T-Mobile

04.2019 - 01.2021

Data Engineer

Fidelity Investments

02.2017 - 03.2019

Data Engineer

CMC Limited

07.2013 - 09.2016

CMC Limited

PAVAN KUMAR REDDY ANNACHEDU

Summary

Overview

Work History

Senior Data Engineer

Sr. Data Engineer

Data Engineer

Data Engineer

Data Engineer

Education

Skills

Timeline

Senior Data Engineer

Sr. Data Engineer

Data Engineer

Data Engineer

Data Engineer

Similar Profiles

Gene Vences-SanchezGene Vences-Sanchez

Tammy BentonTammy Benton

Erica HendryErica Hendry

Abel Maldonado JrAbel Maldonado Jr

NUSRATH MOHAMMEDNUSRATH MOHAMMED