Summary

Overview

Work History

Education

Skills

References

Timeline

Revanth Kumar

Irving,TX

Summary

Having over 5+ years of experience as Senior Data Engineer with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes. Experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies. Experience with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN. Experience in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python. Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming. Experience with Airflow to schedule ETL jobs to extract the data from AWS data warehouse. Proficient in designing, implementing, and optimizing ETL processes using Talend, leveraging its comprehensive suite of data integration tools to ensure seamless data movement and transformation across various systems and platforms. Experienced in utilizing Informatica PowerCenter for ETL development, including mapping design, workflow creation, and performance tuning, to deliver efficient data pipelines meeting business requirements within stringent timelines. Experience structural modifications using Map-Reduce, Hive and analyze data using visualization/reporting tools (Tableau). Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension). Experience working on creating and running Docker images with multiple micro - services. Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift, and EC2 for data processing. Experience with PySpark and Azure Data Factory in creating, developing and deploying high performance ETL pipelines. Experience in developing JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity. Developed Spark jobs on DataBricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases. Hands on experience in SQL and NOSQL database such as Snowflake, HBase, Cassandra and MongoDB. Extensive experience in agile software development methodology. Team Player as well as able to work independently with minimum supervision, innovative & efficient, good in debugging and strong desire to keep pace with latest technologies. Excellent Communication and presentation skills along with good experience in communicating and working with various stake holders.

Overview

years of professional experience

Work History

Data Engineer

Paychex

12.2022 - Current

Worked with the business users to gather, define business requirements and analyze the possible technical solutions
Developed Spark scripts by using Python and Scala shell commands as per the requirement
Wrote Spark jobs with RDD's, Pair RDDs, Transformations and actions, data frames for data transformations from relational sets
Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it
Responsible for Building Scalable Distributed Data solutions using Hadoop
Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries
Developed PySpark script to merge static and dynamic files and cleanse the data
Proficient in utilizing Talend for ETL processes at Paychex, employing its functionalities to design, develop, and maintain robust data pipelines, ensuring efficient and accurate data integration across diverse sources and destinations
Demonstrated expertise in optimizing Talend jobs for performance and scalability, implementing best practices to streamline data transformations, improve workflow efficiency, and enhance overall data quality within Paychex's data infrastructure
Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data
Developed ETL Specification Design document containing detailed information on ETL processing, mapping/workflow specifications, exception handling process, staging and data warehouse schemas, etc
Developed ETL code, control files, metadata, and lineage diagrams for ETL programs
Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart
Created tableau dashboards/reports for the business users
Developing End to End Analytical environment using Power BI
Created basic reports using confidential files as source to fetch the data in Power BI
Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards
Automated resulting scripts and workflow using Apache Airflow to ensure daily execution in production
Created airflow DAG’s to sync files from box, analyze data quality, and alert for missing files
Utilized AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making
Written Pig Scripts for sorting, joining, filtering and grouping data
Extracted data from Teradata database and loaded into Data warehouse using spark
Implemented a Continuous Delivery pipeline with Docker and GitHub
Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket
Performed analysis on the unused user navigation data by loading into HDFS and writing MapReduce jobs
Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake
Used SQL queries and other tools to perform data analysis and profiling
Involved in Agile methodologies, daily scrum meetings, spring planning.

Data Engineer

Walgreens Boots Alliance

06.2021 - 05.2022

Interacted with clients to gather business and system requirements which involved documentation of processes based on the user requirements
Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs
Developed Spark jobs on Databricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases
Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries
Designed number of partitions and replication factor for Kafka topics based on business requirements and worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark)
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala
Worked on migrating MapReduce programs into Spark transformations using Scala
Developed ETL jobs using PySpark, Data Lineage in which the data has been transformed in multiple stages and actions like aggregations are performed
Implemented and maintained multiple ETL processes to synchronize data between different source systems and database
Spearheaded the integration of Talend for ETL processes at Walgreens Boots Alliance, optimizing data extraction, transformation, and loading operations, resulting in a significant reduction in data processing time and improved data accuracy
Developed Spark applications in Databricks using Pyspark and Spark SQL to perform transformations and aggregations on source data before loading it into Azure Synapse Analytics for reporting
Involved on creating multiple kind of Report in Power BI and present it using Story Points
Built Ad-Hoc reports in Power BI depending on business requirements was one of the major responsibilities
Involved in Creating, Debugging, and Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes
Worked on Tableau to build customized interactive reports, worksheets and dashboards
Used Tableau to produce the dashboards which will compare the results before using this solution and after using this solution in staging environment
Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake
Implemented a Continuous Delivery pipeline with Docker and GitHub
Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake
Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster
Worked on designing, building, deploying and maintaining Mongo DB
Implemented SQL, PL/SQL stored procedures
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Data Engineer

Intact Insurance

03.2020 - 05.2021

Involved in Requirement gathering phase to gather the requirements from the business users to continuously accommodate changing user requirements
Developed various spark applications using Scala to perform various enrichment of these click stream data merged with user profile data
Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it
Developed Simple to complex MapReduce Jobs using Hive and Pig
Profile structured, unstructured, and semi-structured data across various sources to identify patterns in data and Implement data quality metrics using necessary query’s or python scripts based on source
Worked on PySpark APIs for data transformations
Prepared dashboards using Tableau for summarizing Configuration, Quotes, Orders and other e-commerce data
Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with Data Frames in Spark
Migrated an existing on-premises application to AWS
Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
Implemented AWS Elastic Container Service (ECS) scheduler to automate application deployment in the cloud using Docker Automation techniques
Analyzed the SQL scripts and designed the solution to implement using PySpark
Extracted files from MongoDB through Sqoop and placed in HDFS and processed
Use SQL queries and other tools to perform data analysis and profiling
Followed agile methodology and involved in daily SCRUM meetings, sprint planning, showcases and retrospective.

Data Engineer

Christus Health

01.2018 - 02.2020

Interacted with the business analysts to gather the requirements and understanding the functional design specifications for the requirements
Developed spark applications for performing large scale transformations and denormalization of relational datasets
Developed various spark applications using Scala to perform various enrichment of these click stream data merged with user profile data
Worked on developing Pyspark script to encrypting the raw data by using Hashing algorithms concepts on client specified columns
Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python
Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers
Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats
Involved in creating, modifying SQL queries, prepared statements and stored procedures used by the application
Followed AGILE (SCRUM) methodologies, had sprint planning every two weeks and setup daily meetings to monitor the status
Participated in the status meetings and status updating to the management team.

Education

Masters in Computer and Information Sciences -

Southern Arkansas University

Magnolia, AR

12.2023

Bachelor of Technology in Computer Science and Engineering -

Jawaharlal Nehru Technological University

Kakinada, India

04.2019

Skills

Python

SQL
Scala
MATLAB
Java
Snowflake
AWS RDS
Teradata
Oracle
MySQL
Microsoft SQL
Postgre SQL
Data Lakes
Talend
Informatica
MSSIS
AWS Glue
AWS
Azure
GCP (Docker, Kubernetes for Containerization)
ETL development

API Development
NoSQL Databases
Data Warehousing
Scripting Languages
SQL Expertise
Big Data Processing
Data Pipeline Design
Spark Framework
Data Analysis
SQL Programming
Data Migration
NoSQL Databases
Data Warehousing
Scripting Languages
SQL Expertise
Big Data Processing
Data Pipeline Design
Spark Framework
Data Analysis
SQL Programming
Data Migration

References

Will be provided upon request.

Timeline

Data Engineer

Paychex

12.2022 - Current

Data Engineer

Walgreens Boots Alliance

06.2021 - 05.2022

Data Engineer

Intact Insurance

03.2020 - 05.2021

Data Engineer

Christus Health

01.2018 - 02.2020

Masters in Computer and Information Sciences -

Southern Arkansas University

Bachelor of Technology in Computer Science and Engineering -

Jawaharlal Nehru Technological University

Revanth Kumar

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Education

Masters in Computer and Information Sciences -

Bachelor of Technology in Computer Science and Engineering -

Skills

References

Timeline

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Masters in Computer and Information Sciences -

Bachelor of Technology in Computer Science and Engineering -

Similar Profiles

Marcos L. UrraMarcos L. Urra

Courtney FullertonCourtney Fullerton

Mandita BarmanMandita Barman

Ryan CarleRyan Carle

Yuri Anderson HandemYuri Anderson Handem