Summary
Overview
Work History
Education
Skills
Timeline
Generic

Rohith Pudota

Cypress,TX

Summary

Delivered scalable, high-performance data engineering solutions with over 11 years of IT expertise in Data Engineering, Data Warehousing, and Big Data. Leveraging 5+ years of experience in data engineering and 4+ years in data warehousing across healthcare, finance, retail, and manufacturing domains. Designed, built, and optimized large-scale ETL/ELT pipelines using Azure Data Factory, Azure Databricks, AWS Glue, Snowflake, and Apache Spark. Implemented real-time data ingestion and analytics pipelines leveraging Kafka, Spark Streaming, Azure Event Hubs, and AWS Kinesis. Developed reusable, parameterized pipelines to reduce development time and improve maintainability. Created optimized data models and schemas using dimensional modeling techniques (star schema, snowflake schema). Applied advanced performance tuning techniques such as partitioning, bucketing, indexing, caching, and workload management. Integrated structured, semi-structured, and unstructured data from diverse sources including RDBMS, APIs, and cloud storage. Automated ETL workflows using Apache Airflow, Oozie, Control-M, IBM Tivoli, Jenkins, and Azure DevOps pipelines. Utilized advanced Snowflake features including SnowSQL scripting, time travel, zero-copy cloning, and complex SQL functions. Built CI/CD frameworks for automated deployment, testing, and version control of data pipelines using Git, Jenkins, and Azure DevOps. Developed robust data quality and validation frameworks to ensure dataset integrity and reliability. Managed large, distributed clusters and fine-tuned Spark jobs for optimal performance and scalability. Designed real-time analytics dashboards with Power BI and Tableau for actionable business insights. Wrote complex SQL queries, stored procedures, triggers, and functions for data transformation and reporting. Processed geospatial and time-series datasets using specialized algorithms and frameworks. Created monitoring frameworks using AWS CloudWatch, Azure Monitor, and custom logging systems. Implemented data security and governance measures including RBAC, encryption, GDPR, and HIPAA compliance. Integrated hybrid data sources between on-premises and cloud environments. Mentored and guided junior data engineers, fostering collaborative learning. Collaborated with stakeholders to translate business requirements into scalable technical solutions. Operated effectively within Agile and Scrum frameworks (sprint planning, daily stand-ups, retrospectives). Delivered high-quality solutions on time while balancing technical and business priorities. Researched and adopted emerging data engineering tools and technologies. Transformed raw data into meaningful insights that drive business decisions. Change my professional summary for a business analyst position Senior engineering professional with deep expertise in data architecture, pipeline development, and big data technologies. Proven track record in optimizing data workflows, enhancing system efficiency, and driving business intelligence initiatives. Strong collaborator, adaptable to evolving project demands, with focus on delivering impactful results through teamwork and innovation. Skilled in SQL, Python, Spark, and cloud platforms, with strategic approach to data management and problem-solving.

Overview

12
12
years of professional experience

Work History

Senior Data Engineer

Optum
08.2023 - Current
  • Designed and implemented Snowflake stages to efficiently load large datasets from structured, semi-structured, and unstructured sources.
  • Created transient, temporary, and permanent Snowflake tables optimized for cost and query performance.
  • Configured and managed multi-cluster Snowflake warehouses to handle high concurrency and ensure performance consistency.
  • Developed and optimized complex SnowSQL scripts for advanced data transformations and automated ETL processes.
  • Leveraged Snowpipe for continuous, automated data ingestion from AWS S3 into Snowflake, ensuring near real-time data availability.
  • Applied partitioning and clustering strategies to accelerate query execution and optimize storage usage.
  • Implemented role-based access control (RBAC) to enhance security and data governance compliance.
  • Utilized Snowflake Time Travel to enable historical data analysis and restoration for auditing purposes.
  • Developed ETL pipelines in AWS Glue integrating AWS S3, Redshift, and on-premises sources for seamless data flow.
  • Built real-time streaming pipelines using AWS Kinesis and Spark Streaming to process millions of events per day.
  • Integrated AWS SNS and SQS to enable event-driven architecture for data pipeline triggers and notifications.
  • Created standardized and reusable ETL templates to streamline pipeline development and maintenance.
  • Applied Redshift performance tuning techniques using distribution keys, sort keys, and workload management.
  • Automated pipeline health checks, logging, and alerts with AWS CloudWatch, improving operational efficiency.
  • Developed interactive Tableau dashboards to visualize KPIs and operational analytics for business teams.
  • Collaborated closely with cross-functional teams to ensure scalability, accuracy, and business alignment of data solutions.
  • Reference: Prashant Gupta (Pgupta@optum.com)

Senior Data Engineer

Foot Locker
07.2020 - 07.2023
  • Built and deployed machine learning models (classification, regression, clustering) using Python, Scikit-learn, TensorFlow, and PyTorch to solve business problems in healthcare and finance domains.
  • Designed and implemented end-to-end data pipelines for data preprocessing, feature engineering, and model deployment using Azure Databricks, Snowflake, and Airflow.
  • Conducted exploratory data analysis (EDA) and statistical modeling to uncover insights and improve decision-making.
  • Developed real-time predictive analytics solutions leveraging Spark Streaming, Kafka, and Azure Event Hubs for anomaly detection and forecasting.
  • Created data visualizations and dashboards with Power BI and Tableau, enabling stakeholders to monitor KPIs and predictive trends.
  • Applied NLP techniques for sentiment analysis and text classification using transformer models (BERT, GPT).
  • Implemented MLOps practices with Azure DevOps and Docker for CI/CD model deployment pipelines.
  • Collaborated with cross-functional teams to translate business requirements into AI-driven solutions, ensuring scalability and performance.
  • Performed A/B testing, model validation, and hyperparameter tuning to optimize performance.
  • Ensured compliance with data governance, security, and privacy regulations (HIPAA, GDPR) while handling sensitive data.
  • Reference: Nchatterjee@footlocker.com
  • Designed and implemented scalable ETL processes to integrate complex data from multiple sources.
  • Mentored junior data engineers, fostering skill development and improving team productivity.
  • Collaborated with cross-functional teams to define data requirements and establish best practices for analytics.
  • Reengineered existing ETL workflows to improve performance by identifying bottlenecks and optimizing code accordingly.
  • Collaborated with cross-functional teams to define requirements and develop end-to-end solutions for complex data engineering projects.
  • Optimized data pipelines by implementing advanced ETL processes and streamlining data flow.

Big Data Developer

Verizon
10.2017 - 04.2020
  • Created and maintained transient, temporary, and permanent Snowflake tables to support a variety of analytical workloads.
  • Implemented advanced partitioning and clustering strategies in Snowflake to improve query performance and optimize data retrieval.
  • Defined robust roles, privileges, and security policies to ensure secure access and compliance with corporate governance.
  • Applied regular expressions and advanced SQL functions to perform data extraction, validation, and transformation.
  • Developed Snowflake scripting solutions to automate ETL processes, data cleansing, and schema updates.
  • Designed and maintained ETL workflows using AWS Glue to extract, transform, and load data into Redshift from multiple data sources.
  • Configured and optimized Redshift clusters for performance, cost efficiency, and scalability.
  • Integrated AWS SNS and SQS to facilitate real-time event processing and messaging between systems.
  • Leveraged AWS Athena for ad-hoc data queries on S3 without the need for ETL staging.
  • Built and deployed AWS Kinesis-based streaming pipelines for real-time analytics and monitoring.
  • Managed DNS configurations and traffic routing via AWS Route53 to support reliable application access.
  • Created IAM roles and policies for secure AWS resource management and access control.
  • Designed large-scale batch processing workflows using Hadoop ecosystem components including HDFS, Hive, Sqoop, MapReduce, and Spark.
  • Implemented Spark Streaming applications for near real-time data transformation and analytics.
  • Reference: Tylerknight@verizon.com

Big Data Developer

Verizon
02.2014 - 09.2017
  • Imported large datasets from MySQL into HDFS using Apache Sqoop for efficient, scalable data ingestion.
  • Performed aggregations on high-volume datasets using Apache Spark and Scala, storing results in Hive for analytics.
  • Managed and maintained enterprise Data Lakes leveraging Hadoop distributions including Hortonworks and Cloudera.
  • Developed and optimized Hive queries to meet complex business reporting requirements.
  • Created and managed HBase tables integrated with Hive for high-performance storage and querying.
  • Built streaming pipelines using Kafka and Spark Streaming for near real-time data processing.
  • Developed data ingestion workflows using Apache Flume and Sqoop to capture customer behavioral data.
  • Applied MapReduce and Hive for large-scale batch analytics across multi-terabyte datasets.
  • Implemented a robust Kafka-Spark-Hive pipeline to process, transform, and store streaming and batch data.
  • Migrated relational datasets from Oracle into Hadoop via Sqoop for advanced analytics use cases.
  • Developed PL/SQL scripts for automated data validation, cleansing, and transformation.
  • Applied partitioning and bucketing strategies in Hive to optimize performance.
  • Created CI/CD pipelines for deploying Hadoop-based projects using Jenkins.
  • Leveraged JIRA for task tracking, defect management, and Agile sprint execution.
  • Utilized PySpark and Spark SQL to accelerate data transformations and improve execution speed.
  • Implemented Spark Streaming to process micro-batch streaming data for analytics dashboards.
  • Used Apache Zookeeper for distributed service coordination within Hadoop clusters.
  • Scheduled and orchestrated workflows using Apache Oozie to manage data processing pipelines.
  • Maintained source code repositories in Git for collaborative development and version tracking.
  • Reference: Tylerknight@verizon.com
  • Implemented randomized sampling techniques for optimized surveys.
  • Developed polished visualizations to share results of data analyses.
  • Improved data collection methods by designing surveys, polls and other instruments.
  • Analyzed large datasets to identify trends and patterns in customer behaviors.
  • Compiled, cleaned and manipulated data for proper handling.
  • Developed scalable big data solutions utilizing Hadoop ecosystem tools to optimize data processing workflows.

Education

Master of Science - computer science

Christian Brothers University
Memphis, TN
12.2013

Bachelor of Science - computer science

Karunya University
Coimbatore, Tamil Nadu, India.
05.2012

Skills

  • Cloud Platforms: Azure (Data Factory, Databricks, Logic Apps, Function Apps, Event Hubs, Synapse), AWS (S3, Redshift, Glue, Lambda, Kinesis, EMR, CloudWatch)
  • Data Warehousing: Snowflake, Redshift, Azure Synapse, Teradata, Oracle, SQL Server
  • Big Data Technologies: Hadoop (HDFS, Hive, MapReduce, HBase, Pig, Sqoop, Oozie, Zookeeper), Spark (PySpark, Spark SQL, Spark Streaming), Kafka, Flume
  • Programming Languages: Python, Scala, Java, SQL, PL/SQL, HiveQL
  • ETL/ELT Tools: Azure Data Factory, AWS Glue, SSIS, Informatica
  • Scheduling & Orchestration: Apache Airflow, Oozie, Control-M, IBM Tivoli, Jenkins, Azure DevOps
  • Data Modeling: Dimensional modeling, Star schema, Snowflake schema, Slowly Changing Dimensions (SCD)
  • Visualization & Reporting: Power BI, Tableau, SSRS
  • Version Control: Git, GitHub, GitLab, Azure Repos
  • Other Tools: Docker, Terraform, JIRA, Confluence, Ambari, Erwin Data Modeler
  • Operating Systems: Windows, Linux, Unix, Ubuntu, CentOS
  • NoSQL databases
  • Python programming
  • Big data processing
  • ETL development
  • Git version control
  • Kafka streaming
  • Data pipeline design
  • Data modeling
  • API development
  • Data visualization
  • Analytical skills
  • Problem-solving aptitude
  • Agile methodologies
  • Data-driven decision making
  • Data repositories
  • Large dataset management
  • Data programming
  • Amazon redshift
  • Data governance
  • Big data technologies
  • SQL transactional replications
  • Data analysis
  • Relational databases
  • Data migration
  • RDBMS
  • SQL programming
  • Database design
  • SQL and databases
  • Data integration
  • Scala programming
  • Data security
  • Machine learning
  • Advanced SQL
  • Spark development
  • Data warehousing
  • Hadoop ecosystem
  • Performance tuning
  • Backup and recovery
  • Enterprise resource planning software

Analytical thinking

Timeline

Senior Data Engineer

Optum
08.2023 - Current

Senior Data Engineer

Foot Locker
07.2020 - 07.2023

Big Data Developer

Verizon
10.2017 - 04.2020

Big Data Developer

Verizon
02.2014 - 09.2017

Master of Science - computer science

Christian Brothers University

Bachelor of Science - computer science

Karunya University