Summary

Overview

Work History

Education

Skills

Tools And Technologies

Timeline

Rohanth Pulimamidi

Summary

Accomplished Data Engineer with extensive experience at CVS Health, specializing in ETL development and data pipeline design. Expert in optimizing data workflows using Spark and Azure Data Factory, while demonstrating strong problem-solving skills. Successfully migrated critical data to cloud environments, enhancing data accessibility and integrity across healthcare systems.

Experienced with designing and optimizing data pipelines to ensure seamless data flow. Utilizes advanced SQL and Python skills to create and maintain robust data architectures. Track record of implementing scalable solutions that enhance data integrity and support informed decision-making.

Knowledgeable with robust background in data architecture and pipeline development. Proven ability to streamline data processes and enhance data integrity through innovative solutions. Demonstrates advanced proficiency in SQL and Python, leveraging these skills to support cross-functional teams and drive data-driven decision-making.

Results-driven data engineering professional with solid foundation in designing and maintaining scalable data systems. Expertise in developing efficient ETL processes and ensuring data accuracy, contributing to impactful business insights. Known for strong collaborative skills and ability to adapt to dynamic project requirements, delivering reliable and timely solutions.

Overview

years of professional experience

Work History

Data Engineer

CVS Health

11.2023 - Current

Applied Agile and Scrum methodologies to effectively manage and execute various phases of the Software Development Life Cycle (SDLC)
Enhanced and optimized Hadoop algorithms leveraging Spark Context, Spark-SQL, DataFrames, Pair RDDs, and YARN for better performance
Migrated data from traditional relational databases to Azure cloud databases
Designed and implemented scalable database solutions using Azure SQL Data Warehouse and Azure SQL
Established seamless data-sharing mechanisms between Snowflake accounts and developed stored procedures and views in Snowflake
Gained extensive experience with healthcare messaging and interoperability standards, including HL7 (versions 2.x and 3.x), FHIR, CCDA, X12 HIPAA, and modules such as Blood Bank, Microbiology, Radiology, Pathology, Patient Access, and more
Hands-on experience writing MapReduce and YARN programs using Java, Scala, and Python to analyze Big Data
Built multiple Proof of Concepts (POCs) using Scala, deployed them on YARN clusters, and conducted comparative performance analyses between Spark, Hive, and SQL/Teradata
Developed Spark applications in Databricks using Spark-SQL for data extraction, transformation, aggregation, and deep analysis of customer usage patterns
Integrated and analyzed diverse healthcare data sources, including EHRs and medical imaging, using Azure Synapse Analytics, creating comprehensive patient health profiles
Processed and analyzed extensive datasets using tools such as HDFS, Hive, HQL, Sqoop, and Zookeeper
Proficient in Azure Data Factory activities, including Lookups, stored procedures, conditional logic (If/For Each), Set/Append Variable, Metadata extraction, Filtering, and Wait actions
Imported and transformed data from multiple sources using Hive and MapReduce, efficiently loading data into HDFS
Utilized Python libraries such as Scikit-learn for data visualization, interpretation, and strategic decision-making
Implemented robust security measures within Azure Synapse to maintain compliance with HIPAA and safeguard the confidentiality of patient information
Automated application deployments using Jenkins, integrating it seamlessly with Git for version control
Created custom Azure Data Factory pipelines, including Copy activities and advanced transformation logic
Automated data imports into Azure SQL and web APIs using Python scripts and scheduled web jobs for recurring data loads
Designed and implemented a comprehensive ETL strategy, leveraging Talend to populate data warehouses from various source systems
Built C# applications for efficient data transfers between Azure Blob Storage and Azure SQL, with additional functionality for API data loads
Analyzed customer behavior data using MongoDB to derive actionable insights
Validated data integrity by implementing MapReduce-based filtering techniques to eliminate unnecessary records prior to loading Hive tables
Collaborated with QA teams using tools like Bugzilla and JIRA for issue tracking and defect management
Imported large-scale datasets from relational databases such as Oracle and Teradata into HDFS using Sqoop
Performed advanced analytics on Hive data using the Spark API over Cloudera Hadoop YARN
Environment: Agile, Azure, Azure Data Bricks, Azure Data Lake, Azure Data Factory, Hadoop, Scala, Spark, Pyspark, Python, HDFS, Kafka, MongoDB, Docker, ETL, Hive, Oozie, Zookeeper, Airflow, Snowflake, NIFI, Power BI, GIT

Data Engineer

Costco

08.2022 - 05.2023

Created end-to-end data pipelines, starting with distributed messaging systems like Kafka for data ingestion and persisting records into Cassandra
Implemented Spark using Scala, leveraging Spark SQL API, DataFrames, and Pair RDDs for efficient data processing and transformation, while also creating RDDs, DataFrames, and Datasets
Designed and deployed Azure Data Factory pipelines to orchestrate data flow from on-premise SQL databases to Azure Data Lake, including data staging and transformations
Migrated critical business data using SQL, SQL Azure, Azure Storage, and Azure Data Factory, along with tools like SSIS and PowerShell, to streamline the process
Configured Azure Data Factory triggers to monitor and schedule pipelines, implemented alerts to notify failures, and automated routine tasks to ensure seamless data workflows
Used Keras for seismic data analysis, optimizing well logs, geological information, and complex database structures in oil and gas domains
Automated data transfer between RDBMS and HDFS using Apache NiFi, improving operational efficiency
Designed and developed scalable database solutions using Azure SQL, Azure SQL Data Warehouse, and Databricks for data integration and advanced analytics
Analyzed customer behavior data and developed insights using NoSQL databases like MongoDB and relational databases including PostgreSQL and MySQL
Developed automation tools using PowerShell scripts and JSON templates to manage services and remediate infrastructure issues
Designed predictive maintenance models on Azure Synapse Analytics, optimizing maintenance schedules and reducing costs in industrial processes
Collaborated in multiple phases of the SDLC using Agile and Scrum methodologies, ensuring seamless development and delivery
Supported MapReduce programs and wrote custom MapReduce jobs using the Java API to analyze and process large datasets
Worked on Hadoop services like HDFS, Kafka, Hive, Zookeeper, Oozie, NiFi, Snowflake, PySpark, and Airflow, with experience in troubleshooting Spark Databricks clusters
Migrated and transformed data using Hive, Spark RDDs, Python, and Scala, converting legacy SQL queries for modern analytics pipelines
Deployed and maintained CI/CD pipelines with Jenkins, integrated with GIT and Maven for version control and seamless deployment workflows
Implemented real-time streaming data processing using Kafka brokers, Spark context, and RDDs, loading processed information into HDFS and NoSQL databases
Developed ETL processes and automation using Python-SQL frameworks, integrating analytics libraries like NumPy, SciPy, and Pandas, alongside MATLAB for advanced modeling
Created custom ETL solutions using Informatica, Oracle databases, PL/SQL, Python, and Shell scripting for efficient data extraction and transformation
Scheduled and monitored pipelines in Azure Data Factory (V2), handling diverse data sources and enabling efficient data operations
Wrote and tested unit cases using JUnit and Mockito frameworks to ensure robust application performance
Developed robust UNIX shell scripts for automating ETL workflows, reducing manual interventions
Utilized tools like Apache Spark for log data analysis and error prediction, while maintaining high data quality standards
Environment: Agile, Azure, Hadoop, Scala, Spark, Pyspark, Python, HDFS, Kafka, Cloud Watch, Cloud Formation, Athena, MongoDB, Docker, ETL, Hive, Oozie, Zookeeper, Airflow, Snowflake, NIFI, Tableau, GIT

Data Engineer

Goldman Sachs

07.2021 - 08.2022

Designed and built event-driven ETL pipelines using AWS Glue to process new data appearing in AWS S3, ensuring proper understanding of data assets
Developed Spark workflows in Scala to retrieve data from AWS S3 buckets and Snowflake, applying necessary transformations for downstream processing
Created and optimized data pipelines for ingestion, aggregation, and consumer response data loading from AWS S3 into Hive external tables on HDFS, enabling Tableau dashboards
Utilized Apache Kafka for implementing a distributed messaging queue to integrate seamlessly with Cassandra for real-time data processing
Worked extensively with AWS services such as CloudWatch, CloudTrail, CloudFormation, Athena, Docker, Kubernetes, Terraform, Glue to build scalable solutions
Involved in deploying Big Data Hadoop applications on AWS Cloud using Talend, ensuring reliability and performance
Developed Spark applications in both Scala and Python to process streaming data from multiple sources and conducted advanced data transformations
Designed and developed ETL jobs and pipelines leveraging tools such as IBM DataStage and Apache NiFi for robust data handling
Used Pig scripts to create MapReduce jobs, performing ETL processes on HDFS for structured and semi-structured data
Transformed and loaded large datasets, including structured, semi-structured, and unstructured data, using Hadoop concepts and tools like Hive and MapReduce
Participated in Agile-based development processes, incorporating test-driven development and pair programming for collaborative problem-solving
Wrote and optimized build scripts using Maven and configured the Log4J Logging framework for tracking and debugging applications
Converted complex Cassandra, Hive, and SQL queries into Spark transformations using Spark RDDs, Scala, and Python
Developed data pipelines using MapReduce, Sqoop, Flume, and Pig to analyze customer behavioral data stored in HDFS
Implemented CI/CD pipelines with tools like Jenkins, Maven, and Docker to automate deployment workflows and improve operational efficiency
Worked with visualization and analysis tools, including Tableau and Splunk, to create reports such as bar graphs, pie charts, and conduct regression analysis
Automated data movement between RDBMS and HDFS using Apache NiFi, improving data integration efficiency
Gained hands-on experience with NoSQL databases, such as HBase and Cassandra, for managing large-scale distributed data
Loaded and transformed data from various streaming and batch sources into Hadoop environments, leveraging advanced techniques for data preparation
Configured and developed ETL processes using AWS Glue, automating transformations and loading for analysis-ready datasets
Environment: Scrum, Hadoop, Hive, Spark, Pyspark, Oozie, Storm, Yarn, Snowflake, Flume, NiFi, and Airflow, AWS, ETL, Maven, NoSQL, Cassandra, Scala, Python, Spark

Data Analyst

Indian Com

08.2019 - 07.2021

Imported data from MySQL into HDFS using Sqoop, applied transformations using Hive and MapReduce, and loaded processed data back into HDFS
Developed PySpark programs to read and write multiple data formats, including JSON, ORC, and Parquet, and stored them on HDFS for downstream processing
Utilized Spark-SQL to load and process JSON data, creating Schema RDDs and loading structured data into Hive tables for advanced analytics
Created a PySpark framework to migrate data from DB2 into Amazon S3, enabling efficient cloud storage solutions
Built complex MapReduce streaming jobs in Java for ETL tasks, including data cleaning, scrubbing, and transformations, seamlessly integrating with Hive and Pig
Leveraged Agile and Scrum methodologies to actively participate in sprints, refine requirements, and ensure effective collaboration across teams
Designed Hive and Pig scripts to handle transformations, event joins, traffic filtering, and pre-aggregations before storing data in HDFS for long-term storage
Created validation queries to verify ETL processes after each run, shared results with business users, and ensured data integrity during various project phases
Converted SQL and Hive queries into Spark transformations using Spark RDDs, Scala, and Python to enable faster and more scalable data processing
Built HBase tables to store large volumes of semi-structured data from diverse sources, ensuring scalability and reliability in NoSQL environments
Worked on developing Spark pipelines using Spark RDDs and DataFrames for processing large datasets, optimizing performance, and reducing processing times
Used Spark with Scala to create ETL pipelines, handle real-time data ingestion, and enable seamless integration with structured and semi-structured data sources
Designed and executed workflows to support the migration of legacy databases to cloud-based storage systems using AWS S3 and Snowflake
Designed and implemented data transformations using PySpark, ensuring efficient data processing for analytics applications
Integrated and processed large datasets using Spark for faster insights, while handling structured, semi-structured, and unstructured data with ease
Proficiently handled importing and processing real-time data from RDBMS into HDFS, applying Hive for querying and MapReduce for parallel processing
Developed monitoring dashboards to track ETL pipeline performance, ensuring timely identification of bottlenecks and troubleshooting errors effectively
Environment: Agile, AWS, S3, EC2, S3, RedShift, RDS, HDFS, HBase, MapReduce, Hive, Spark and Zookeeper, Mongo DB, GIT, Spark SQL, Spark, R, and Python, ETL

Education

Master of Science -

Oklahoma Christian University

Edmond, OK

12-2025

Skills

ETL development
Data warehousing

Data modeling
Data pipeline design

Tools And Technologies

Hadoop, Apache Spark, HDFS, Map Reduce, Sqoop, Hive, Oozie, Zookeeper, Kafka, NIFI, Databricks, Snowflake, Informatica, Talend, Power BI, Tableau, Java, Python, Scala, SQL, Shell Scripting, R, MYSQL, Oracle, PostgreSQL, MS-SQL Server, NO SQL, HBase, Cassandra, Dynamo DB, Mongo DB, Hadoop, Cloudera, Spark, Git, Bitbucket, SVN, CVS, Linux, Unix, Mac OS, Windows, Azure, AWS

Timeline

Data Engineer

CVS Health

11.2023 - Current

Data Engineer

Costco

08.2022 - 05.2023

Data Engineer

Goldman Sachs

07.2021 - 08.2022

Data Analyst

Indian Com

08.2019 - 07.2021

Master of Science -

Oklahoma Christian University

Rohanth Pulimamidi

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Data Analyst

Education

Master of Science -

Skills

Tools And Technologies

Timeline

Data Engineer

Data Engineer

Data Engineer

Data Analyst

Master of Science -

Similar Profiles

Tanishia RatcliffeTanishia Ratcliffe

Anthony C AvalosAnthony C Avalos

Misbah QureshiMisbah Qureshi

Lucy BlaizeLucy Blaize

ASHWITA GAJAREASHWITA GAJARE