Summary
Overview
Work History
Education
Skills
Tools And Technologies
Timeline
Generic

Rohanth Pulimamidi

Summary

Accomplished Data Engineer with extensive experience at CVS Health, specializing in ETL development and data pipeline design. Expert in optimizing data workflows using Spark and Azure Data Factory, while demonstrating strong problem-solving skills. Successfully migrated critical data to cloud environments, enhancing data accessibility and integrity across healthcare systems.

Experienced with designing and optimizing data pipelines to ensure seamless data flow. Utilizes advanced SQL and Python skills to create and maintain robust data architectures. Track record of implementing scalable solutions that enhance data integrity and support informed decision-making.

Knowledgeable with robust background in data architecture and pipeline development. Proven ability to streamline data processes and enhance data integrity through innovative solutions. Demonstrates advanced proficiency in SQL and Python, leveraging these skills to support cross-functional teams and drive data-driven decision-making.

Results-driven data engineering professional with solid foundation in designing and maintaining scalable data systems. Expertise in developing efficient ETL processes and ensuring data accuracy, contributing to impactful business insights. Known for strong collaborative skills and ability to adapt to dynamic project requirements, delivering reliable and timely solutions.

Overview

6
6
years of professional experience

Work History

Data Engineer

CVS Health
11.2023 - Current
  • Applied Agile and Scrum methodologies to effectively manage and execute various phases of the Software Development Life Cycle (SDLC)
  • Enhanced and optimized Hadoop algorithms leveraging Spark Context, Spark-SQL, DataFrames, Pair RDDs, and YARN for better performance
  • Migrated data from traditional relational databases to Azure cloud databases
  • Designed and implemented scalable database solutions using Azure SQL Data Warehouse and Azure SQL
  • Established seamless data-sharing mechanisms between Snowflake accounts and developed stored procedures and views in Snowflake
  • Gained extensive experience with healthcare messaging and interoperability standards, including HL7 (versions 2.x and 3.x), FHIR, CCDA, X12 HIPAA, and modules such as Blood Bank, Microbiology, Radiology, Pathology, Patient Access, and more
  • Hands-on experience writing MapReduce and YARN programs using Java, Scala, and Python to analyze Big Data
  • Built multiple Proof of Concepts (POCs) using Scala, deployed them on YARN clusters, and conducted comparative performance analyses between Spark, Hive, and SQL/Teradata
  • Developed Spark applications in Databricks using Spark-SQL for data extraction, transformation, aggregation, and deep analysis of customer usage patterns
  • Integrated and analyzed diverse healthcare data sources, including EHRs and medical imaging, using Azure Synapse Analytics, creating comprehensive patient health profiles
  • Processed and analyzed extensive datasets using tools such as HDFS, Hive, HQL, Sqoop, and Zookeeper
  • Proficient in Azure Data Factory activities, including Lookups, stored procedures, conditional logic (If/For Each), Set/Append Variable, Metadata extraction, Filtering, and Wait actions
  • Imported and transformed data from multiple sources using Hive and MapReduce, efficiently loading data into HDFS
  • Utilized Python libraries such as Scikit-learn for data visualization, interpretation, and strategic decision-making
  • Implemented robust security measures within Azure Synapse to maintain compliance with HIPAA and safeguard the confidentiality of patient information
  • Automated application deployments using Jenkins, integrating it seamlessly with Git for version control
  • Created custom Azure Data Factory pipelines, including Copy activities and advanced transformation logic
  • Automated data imports into Azure SQL and web APIs using Python scripts and scheduled web jobs for recurring data loads
  • Designed and implemented a comprehensive ETL strategy, leveraging Talend to populate data warehouses from various source systems
  • Built C# applications for efficient data transfers between Azure Blob Storage and Azure SQL, with additional functionality for API data loads
  • Analyzed customer behavior data using MongoDB to derive actionable insights
  • Validated data integrity by implementing MapReduce-based filtering techniques to eliminate unnecessary records prior to loading Hive tables
  • Collaborated with QA teams using tools like Bugzilla and JIRA for issue tracking and defect management
  • Imported large-scale datasets from relational databases such as Oracle and Teradata into HDFS using Sqoop
  • Performed advanced analytics on Hive data using the Spark API over Cloudera Hadoop YARN
  • Environment: Agile, Azure, Azure Data Bricks, Azure Data Lake, Azure Data Factory, Hadoop, Scala, Spark, Pyspark, Python, HDFS, Kafka, MongoDB, Docker, ETL, Hive, Oozie, Zookeeper, Airflow, Snowflake, NIFI, Power BI, GIT

Data Engineer

Costco
08.2022 - 05.2023
  • Created end-to-end data pipelines, starting with distributed messaging systems like Kafka for data ingestion and persisting records into Cassandra
  • Implemented Spark using Scala, leveraging Spark SQL API, DataFrames, and Pair RDDs for efficient data processing and transformation, while also creating RDDs, DataFrames, and Datasets
  • Designed and deployed Azure Data Factory pipelines to orchestrate data flow from on-premise SQL databases to Azure Data Lake, including data staging and transformations
  • Migrated critical business data using SQL, SQL Azure, Azure Storage, and Azure Data Factory, along with tools like SSIS and PowerShell, to streamline the process
  • Configured Azure Data Factory triggers to monitor and schedule pipelines, implemented alerts to notify failures, and automated routine tasks to ensure seamless data workflows
  • Used Keras for seismic data analysis, optimizing well logs, geological information, and complex database structures in oil and gas domains
  • Automated data transfer between RDBMS and HDFS using Apache NiFi, improving operational efficiency
  • Designed and developed scalable database solutions using Azure SQL, Azure SQL Data Warehouse, and Databricks for data integration and advanced analytics
  • Analyzed customer behavior data and developed insights using NoSQL databases like MongoDB and relational databases including PostgreSQL and MySQL
  • Developed automation tools using PowerShell scripts and JSON templates to manage services and remediate infrastructure issues
  • Designed predictive maintenance models on Azure Synapse Analytics, optimizing maintenance schedules and reducing costs in industrial processes
  • Collaborated in multiple phases of the SDLC using Agile and Scrum methodologies, ensuring seamless development and delivery
  • Supported MapReduce programs and wrote custom MapReduce jobs using the Java API to analyze and process large datasets
  • Worked on Hadoop services like HDFS, Kafka, Hive, Zookeeper, Oozie, NiFi, Snowflake, PySpark, and Airflow, with experience in troubleshooting Spark Databricks clusters
  • Migrated and transformed data using Hive, Spark RDDs, Python, and Scala, converting legacy SQL queries for modern analytics pipelines
  • Deployed and maintained CI/CD pipelines with Jenkins, integrated with GIT and Maven for version control and seamless deployment workflows
  • Implemented real-time streaming data processing using Kafka brokers, Spark context, and RDDs, loading processed information into HDFS and NoSQL databases
  • Developed ETL processes and automation using Python-SQL frameworks, integrating analytics libraries like NumPy, SciPy, and Pandas, alongside MATLAB for advanced modeling
  • Created custom ETL solutions using Informatica, Oracle databases, PL/SQL, Python, and Shell scripting for efficient data extraction and transformation
  • Scheduled and monitored pipelines in Azure Data Factory (V2), handling diverse data sources and enabling efficient data operations
  • Wrote and tested unit cases using JUnit and Mockito frameworks to ensure robust application performance
  • Developed robust UNIX shell scripts for automating ETL workflows, reducing manual interventions
  • Utilized tools like Apache Spark for log data analysis and error prediction, while maintaining high data quality standards
  • Environment: Agile, Azure, Hadoop, Scala, Spark, Pyspark, Python, HDFS, Kafka, Cloud Watch, Cloud Formation, Athena, MongoDB, Docker, ETL, Hive, Oozie, Zookeeper, Airflow, Snowflake, NIFI, Tableau, GIT

Data Engineer

Goldman Sachs
07.2021 - 08.2022
  • Designed and built event-driven ETL pipelines using AWS Glue to process new data appearing in AWS S3, ensuring proper understanding of data assets
  • Developed Spark workflows in Scala to retrieve data from AWS S3 buckets and Snowflake, applying necessary transformations for downstream processing
  • Created and optimized data pipelines for ingestion, aggregation, and consumer response data loading from AWS S3 into Hive external tables on HDFS, enabling Tableau dashboards
  • Utilized Apache Kafka for implementing a distributed messaging queue to integrate seamlessly with Cassandra for real-time data processing
  • Worked extensively with AWS services such as CloudWatch, CloudTrail, CloudFormation, Athena, Docker, Kubernetes, Terraform, Glue to build scalable solutions
  • Involved in deploying Big Data Hadoop applications on AWS Cloud using Talend, ensuring reliability and performance
  • Developed Spark applications in both Scala and Python to process streaming data from multiple sources and conducted advanced data transformations
  • Designed and developed ETL jobs and pipelines leveraging tools such as IBM DataStage and Apache NiFi for robust data handling
  • Used Pig scripts to create MapReduce jobs, performing ETL processes on HDFS for structured and semi-structured data
  • Transformed and loaded large datasets, including structured, semi-structured, and unstructured data, using Hadoop concepts and tools like Hive and MapReduce
  • Participated in Agile-based development processes, incorporating test-driven development and pair programming for collaborative problem-solving
  • Wrote and optimized build scripts using Maven and configured the Log4J Logging framework for tracking and debugging applications
  • Converted complex Cassandra, Hive, and SQL queries into Spark transformations using Spark RDDs, Scala, and Python
  • Developed data pipelines using MapReduce, Sqoop, Flume, and Pig to analyze customer behavioral data stored in HDFS
  • Implemented CI/CD pipelines with tools like Jenkins, Maven, and Docker to automate deployment workflows and improve operational efficiency
  • Worked with visualization and analysis tools, including Tableau and Splunk, to create reports such as bar graphs, pie charts, and conduct regression analysis
  • Automated data movement between RDBMS and HDFS using Apache NiFi, improving data integration efficiency
  • Gained hands-on experience with NoSQL databases, such as HBase and Cassandra, for managing large-scale distributed data
  • Loaded and transformed data from various streaming and batch sources into Hadoop environments, leveraging advanced techniques for data preparation
  • Configured and developed ETL processes using AWS Glue, automating transformations and loading for analysis-ready datasets
  • Environment: Scrum, Hadoop, Hive, Spark, Pyspark, Oozie, Storm, Yarn, Snowflake, Flume, NiFi, and Airflow, AWS, ETL, Maven, NoSQL, Cassandra, Scala, Python, Spark

Data Analyst

Indian Com
08.2019 - 07.2021
  • Imported data from MySQL into HDFS using Sqoop, applied transformations using Hive and MapReduce, and loaded processed data back into HDFS
  • Developed PySpark programs to read and write multiple data formats, including JSON, ORC, and Parquet, and stored them on HDFS for downstream processing
  • Utilized Spark-SQL to load and process JSON data, creating Schema RDDs and loading structured data into Hive tables for advanced analytics
  • Created a PySpark framework to migrate data from DB2 into Amazon S3, enabling efficient cloud storage solutions
  • Built complex MapReduce streaming jobs in Java for ETL tasks, including data cleaning, scrubbing, and transformations, seamlessly integrating with Hive and Pig
  • Leveraged Agile and Scrum methodologies to actively participate in sprints, refine requirements, and ensure effective collaboration across teams
  • Designed Hive and Pig scripts to handle transformations, event joins, traffic filtering, and pre-aggregations before storing data in HDFS for long-term storage
  • Created validation queries to verify ETL processes after each run, shared results with business users, and ensured data integrity during various project phases
  • Converted SQL and Hive queries into Spark transformations using Spark RDDs, Scala, and Python to enable faster and more scalable data processing
  • Built HBase tables to store large volumes of semi-structured data from diverse sources, ensuring scalability and reliability in NoSQL environments
  • Worked on developing Spark pipelines using Spark RDDs and DataFrames for processing large datasets, optimizing performance, and reducing processing times
  • Used Spark with Scala to create ETL pipelines, handle real-time data ingestion, and enable seamless integration with structured and semi-structured data sources
  • Designed and executed workflows to support the migration of legacy databases to cloud-based storage systems using AWS S3 and Snowflake
  • Designed and implemented data transformations using PySpark, ensuring efficient data processing for analytics applications
  • Integrated and processed large datasets using Spark for faster insights, while handling structured, semi-structured, and unstructured data with ease
  • Proficiently handled importing and processing real-time data from RDBMS into HDFS, applying Hive for querying and MapReduce for parallel processing
  • Developed monitoring dashboards to track ETL pipeline performance, ensuring timely identification of bottlenecks and troubleshooting errors effectively
  • Environment: Agile, AWS, S3, EC2, S3, RedShift, RDS, HDFS, HBase, MapReduce, Hive, Spark and Zookeeper, Mongo DB, GIT, Spark SQL, Spark, R, and Python, ETL

Education

Master of Science -

Oklahoma Christian University
Edmond, OK
12-2025

Skills

  • ETL development
  • Data warehousing
  • Data modeling
  • Data pipeline design

Tools And Technologies

Hadoop, Apache Spark, HDFS, Map Reduce, Sqoop, Hive, Oozie, Zookeeper, Kafka, NIFI, Databricks, Snowflake, Informatica, Talend, Power BI, Tableau, Java, Python, Scala, SQL, Shell Scripting, R, MYSQL, Oracle, PostgreSQL, MS-SQL Server, NO SQL, HBase, Cassandra, Dynamo DB, Mongo DB, Hadoop, Cloudera, Spark, Git, Bitbucket, SVN, CVS, Linux, Unix, Mac OS, Windows, Azure, AWS

Timeline

Data Engineer

CVS Health
11.2023 - Current

Data Engineer

Costco
08.2022 - 05.2023

Data Engineer

Goldman Sachs
07.2021 - 08.2022

Data Analyst

Indian Com
08.2019 - 07.2021

Master of Science -

Oklahoma Christian University
Rohanth Pulimamidi