Summary
Overview
Work History
Education
Skills
Timeline
Generic

PAVAN KUMAR REDDY ANNACHEDU

Summary

Senior Data Engineer with around 10+ years of expertise in both on-premises and cloud data solutions, including AWS, Azure, Snowflake, Python, SQL, and NoSQL. Strong experience in Linux/Unix environments and proficient in writing Shell Scripts. Extensive experience in building and optimizing data pipelines, employing ETL processes and automation using MSBI tools such as SSIS, SSRS, and SSAS, ensuring seamless data flow across the organization. Agile Scrum practitioner with a proven track record of successfully managing complex data engineering projects, delivering high-quality solutions within tight timelines. Proficient in Cloud technologies such as AWS EMR, Redshift, Lambda, Step Functions, and Cloud Watch. Integrated diverse data sources into a unified data platform, leveraging Azure services such as Azure Synapse Analytics and Azure Data Lake Storage. Proficient in designing and implementing data storage solutions using Azure Data Lake Storage and Azure Blob Storage for efficient data organization and accessibility Specialized in Natural Language Processing (NLP), utilizing advanced algorithms and Python-based frameworks to extract valuable insights from unstructured data, enhancing data-driven decision-making. Experienced Data Analyst specializing in statistical analysis with SPSS and data visualization using Excel. Adept at using Google Analytics for tracking user behavior and SAS for advanced analytics and reporting. Detail-Oriented Data Analyst with expertise in developing process maps using Lucidchart and Microsoft Visio. Skilled in data storage optimization with Oracle Database and statistical data analysis using SPSS and SAS. Proficient Data Analyst experienced in creating and maintaining dashboards with Excel, tracking key metrics with Google Analytics, and utilizing Confluence for project documentation. Proficient in JIRA for efficient project management, ensuring clear communication and collaboration among cross-functional teams during the development lifecycle. Well-versed in DevOps practices, implementing CI/CD pipelines and infrastructure as code to streamline development processes, reducing time-to-deployment and enhancing overall system reliability. Power BI and Qlik Sense expert with a keen eye for data visualization, transforming raw data into actionable insights for stakeholders and business leaders. Versatile programming skills in Python, PySpark, Java, R, SQL, T-SQL, NOSQL, and MATLAB, enabling the development of scalable and efficient data processing solutions across diverse technology stacks.

Overview

12
12
years of professional experience

Work History

Senior Data Engineer

PNC Bank
04.2023 - Current
  • Developed and maintained Python scripts for Extract, Transform, Load (ETL) processes, ensuring efficient data movement and transformation across diverse data sources
  • Utilized Python libraries for data manipulation and processing, implementing data cleansing, transformation, and aggregation tasks to prepare data for analysis
  • Designed and implemented database schemas using T-SQL, ensuring efficient data organization and adherence to normalization principles
  • Designed, implemented, and maintained robust data pipelines on Azure using services like Azure Data Factory, ensuring efficient and scalable data processing
  • Developed and managed data warehousing solutions on Azure SQL Data Warehouse (now Azure Synapse Analytics), ensuring optimal performance and accessibility
  • Administered Azure-based databases, including Azure SQL Database and Cosmos DB, ensuring data consistency, availability, and security
  • Optimized SQL queries for performance, utilizing indexing, query hints, and execution plan analysis to enhance database query response times
  • Leveraged Spark SQL for complex data analysis tasks, writing optimized SQL queries to manipulate and aggregate data stored in Hive and Parquet formats
  • Managed and administered Hadoop clusters using Cloudera Manager and Ambari, ensuring high availability and scalability of HDFS and YARN services
  • Administered MS SQL Server databases, including performance monitoring, backup and recovery, and schema management using SQL Server Management Studio (SSMS)
  • Implemented data processing jobs using Scala within Apache Spark, leveraging distributed computing capabilities for large-scale data transformations
  • Developed and maintained applications for big data processing, working with frameworks such as Apache Hadoop and Apache Flink
  • Designed and implemented Snowflake data warehouses, considering best practices for schema design, clustering, and partitioning to ensure optimal performance and scalability
  • Implemented batch job scheduling and automation using Control-M, orchestrating ETL workflows across multiple environments to ensure timely and accurate data processing
  • Designed and developed interactive and visually compelling reports and dashboards in Power BI, incorporating KPIs and data visualizations to provide actionable insights for business stakeholders
  • Implemented and maintained version control for data engineering codebase using Git, ensuring a systematic and organized approach to tracking changes, facilitating collaboration, and enabling easy rollbacks
  • Actively participated in Scrum ceremonies, including sprint planning and backlog refinement, by collaborating with cross-functional teams to prioritize and estimate data engineering tasks, ensuring alignment with project goals
  • Collaborated effectively with cross-functional teams, including data scientists, analysts, software engineers, and business stakeholders, fostering a collaborative environment to achieve project goals
  • Environment: SSRS, Informatica PowerCenter, Python, T-SQL, Spark SQL, Cloudera Manager and Ambari

Sr. Data Engineer

Mayo Clinic
02.2021 - 03.2023
  • Implemented ETL processes using AWS Glue for data integration from multiple sources into Redshift, automating data workflows and improving data accuracy and timeliness
  • Implemented complex ETL processes using Informatica PowerCenter to extract, transform, and load data from various source systems into a centralized data warehouse
  • Designed and developed data integration jobs using Talend to connect disparate data sources and streamline data flows for analytics and reporting purposes
  • Implemented data validation and cleansing routines in Python, ensuring data quality and integrity throughout the ETL pipeline
  • Designed and implemented relational database schemas in SQL Server and MySQL, ensuring efficient data storage and retrieval for analytical applications
  • Developed and maintained ETL pipelines using SQL scripts and stored procedures to extract, transform, and load data from diverse sources into data warehouse systems, ensuring data consistency and integrity
  • Automated data engineering workflows using Apache Spark, implementing scheduling and orchestration to ensure timely execution of data pipelines and tasks
  • Optimized Spark code for performance, adhering to best practices and coding standards to enhance readability, maintainability, and scalability of data engineering solutions
  • Designed and implemented efficient data pipelines using PySpark, ensuring seamless data extraction, transformation, and loading
  • Developed custom MapReduce jobs in Java to process and analyze large datasets stored in Hadoop Distributed File System (HDFS), optimizing data processing workflows
  • Developed and maintained T-SQL stored procedures for data manipulation, ensuring consistent and secure access to database resources
  • Adhered to and promoted best practices for data warehousing within Snowflake, including data governance, version control, and documentation, to ensure the reliability and maintainability of data solutions
  • Orchestrated ETL processes and data integration tasks using Control M, ensuring timely execution of complex data workflows
  • Created interactive Power BI dashboards and reports for data visualization and business intelligence
  • Managed version control using Git and collaborated with team members for code reviews and branching strategies
  • Used JIRA for bug tracking and issue tracking and added several options to the application to choose algorithm for data and address generation
  • Communicated effectively with business stakeholders to gather data requirements, understand project goals, and provide updates on data engineering progress, ensuring alignment with organizational objectives
  • Environment: Python, Apache Spark, PySpark, MapReduce, Hadoop, T-SQL, Snowflake, Data warehouse, ETL, JIRA

Data Engineer

T-Mobile
04.2019 - 01.2021
  • Integrated Hadoop for distributed data processing, ensuring efficient handling of large-scale datasets and enhancing data processing capabilities
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata
  • Enhanced ad-hoc report creation and data processing for business reports by reviewing and modifying SAS Programs and Spark jobs, utilizing Spark's distributed computing framework for scalability and performance improvements
  • Involved in information-gathering meetings and JAD sessions to gather business requirements, deliver business requirements document and preliminary logical data model
  • Made Power BI reports more interact and activate by using storytelling features such as bookmarks, selection panes, drill through filters also created custom visualizations using 'R-Programming Language'
  • Developed and maintained Excel dashboards to monitor KPIs, streamline reporting processes, and support data-driven decision-making
  • Integrated Azure Active Directory (AAD) with data analytics solutions for user authentication and authorization, ensuring secure access to data assets
  • Implemented Azure Monitor for proactive monitoring and alerting of Azure resources, ensuring high availability and performance of data analytics platforms
  • Used ETL methodology for supporting data extraction, transformations and loading processing, in a complex EDW using Informatica
  • Environment: SQL, Spark, Hadoop, SAS, DAX, GoogleAnalytics, Confluence

Data Engineer

Fidelity Investments
02.2017 - 03.2019
  • Worked on Advanced SQL skills, fluent in Python, advanced Microsoft Office skills, particularly Excel and analytical platforms
  • Involved in extensive DATA validation by writing several complex SQL queries and involved in back-end testing and worked with data quality issues
  • Created remote SAS sessions to run the jobs in parallel mode to cut off the extraction time as the datasets were generated simultaneously
  • Analyzed large datasets using Excel, leveraging advanced functions, pivot tables, and data visualization tools to present findings to stakeholders
  • Created Entity-Relationship (ER) diagrams to model database structures, aiding in the design and development of robust data management solutions
  • Developed process flow diagrams and organizational charts using Microsoft Visio to streamline business operations and improve communication across teams
  • Documented project requirements and progress using Confluence to ensure clear communication and project tracking across all stakeholders
  • Environment: Excel, ER diagrams, Microsoft Visio, Confluence

Data Engineer

CMC Limited
07.2013 - 09.2016
  • Works with key stakeholders to understand the data requirements and translates strategic requirements into usable enterprise information architecture
  • Performed requirements assessments and designed suitable data flows or data batches
  • Utilized statistical methods and machine learning algorithms to derive actionable insights, unveiling trends, correlations and patterns within intricate datasets
  • Executed SQL queries to navigate databases, refine complex queries and conduct data manipulations, assuring prompt data access
  • Established data validation protocols to maintain data integrity and precision
  • Leveraged Python and R for sophisticated data preprocessing tasks, including cleaning, normalization and transformation, readying datasets for comprehensive analysis and modeling
  • Designed and managed A/B testing frameworks to assess the impact of new features or modifications on user behavior and key business metrics, underpinning data-informed strategic decisions
  • Conducted workshops and training sessions to spread data literacy and analytical methodologies throughout the organization, nurturing a foundation of data-centric decision-making
  • Environment: SQL, Excel, ER diagrams, Microsoft Visio, Confluence

Education

CMC Limited
09.2016

Skills

  • Python
  • PySpark
  • Java
  • Scala
  • R
  • SQL
  • T-SQL
  • Shell Scripting
  • Apache Hadoop Ecosystem
  • Cloudera Manager
  • Ambari
  • AWS
  • Azure
  • Snowflake
  • Informatica PowerCenter
  • Talend
  • SSIS
  • Control-M
  • AWS Glue
  • Azure SQL Data Warehouse
  • Azure SQL Database
  • Oracle Database
  • MS SQL Server
  • Cosmos DB
  • Spark SQL
  • Apache Kafka
  • MSBI
  • Power BI
  • Qlik
  • SPSS
  • SAS
  • Excel
  • Google Analytics
  • Microsoft Visio
  • Lucidchart
  • Confluence
  • Git
  • Jenkins
  • AWS CodePipeline
  • AWS CodeDeploy
  • Agile Scrum
  • JIRA
  • Data Cleansing
  • Transformation
  • Aggregation
  • Process Flow Diagrams
  • Network Diagrams
  • System Architecture Layouts
  • Process Maps
  • Workflow Diagrams
  • ER Diagrams

Timeline

Senior Data Engineer

PNC Bank
04.2023 - Current

Sr. Data Engineer

Mayo Clinic
02.2021 - 03.2023

Data Engineer

T-Mobile
04.2019 - 01.2021

Data Engineer

Fidelity Investments
02.2017 - 03.2019

Data Engineer

CMC Limited
07.2013 - 09.2016

CMC Limited
PAVAN KUMAR REDDY ANNACHEDU