Summary
Overview
Work History
Education
Skills
References
Timeline
Generic

Revanth Kumar

Irving,TX

Summary

Having over 5+ years of experience as Senior Data Engineer with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes. Experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies. Experience with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN. Experience in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python. Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming. Experience with Airflow to schedule ETL jobs to extract the data from AWS data warehouse. Proficient in designing, implementing, and optimizing ETL processes using Talend, leveraging its comprehensive suite of data integration tools to ensure seamless data movement and transformation across various systems and platforms. Experienced in utilizing Informatica PowerCenter for ETL development, including mapping design, workflow creation, and performance tuning, to deliver efficient data pipelines meeting business requirements within stringent timelines. Experience structural modifications using Map-Reduce, Hive and analyze data using visualization/reporting tools (Tableau). Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension). Experience working on creating and running Docker images with multiple micro - services. Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift, and EC2 for data processing. Experience with PySpark and Azure Data Factory in creating, developing and deploying high performance ETL pipelines. Experience in developing JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity. Developed Spark jobs on DataBricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases. Hands on experience in SQL and NOSQL database such as Snowflake, HBase, Cassandra and MongoDB. Extensive experience in agile software development methodology. Team Player as well as able to work independently with minimum supervision, innovative & efficient, good in debugging and strong desire to keep pace with latest technologies. Excellent Communication and presentation skills along with good experience in communicating and working with various stake holders.

Overview

6
6
years of professional experience

Work History

Data Engineer

Paychex
12.2022 - Current
  • Worked with the business users to gather, define business requirements and analyze the possible technical solutions
  • Developed Spark scripts by using Python and Scala shell commands as per the requirement
  • Wrote Spark jobs with RDD's, Pair RDDs, Transformations and actions, data frames for data transformations from relational sets
  • Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it
  • Responsible for Building Scalable Distributed Data solutions using Hadoop
  • Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries
  • Developed PySpark script to merge static and dynamic files and cleanse the data
  • Proficient in utilizing Talend for ETL processes at Paychex, employing its functionalities to design, develop, and maintain robust data pipelines, ensuring efficient and accurate data integration across diverse sources and destinations
  • Demonstrated expertise in optimizing Talend jobs for performance and scalability, implementing best practices to streamline data transformations, improve workflow efficiency, and enhance overall data quality within Paychex's data infrastructure
  • Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data
  • Developed ETL Specification Design document containing detailed information on ETL processing, mapping/workflow specifications, exception handling process, staging and data warehouse schemas, etc
  • Developed ETL code, control files, metadata, and lineage diagrams for ETL programs
  • Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart
  • Created tableau dashboards/reports for the business users
  • Developing End to End Analytical environment using Power BI
  • Created basic reports using confidential files as source to fetch the data in Power BI
  • Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards
  • Automated resulting scripts and workflow using Apache Airflow to ensure daily execution in production
  • Created airflow DAG’s to sync files from box, analyze data quality, and alert for missing files
  • Utilized AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making
  • Written Pig Scripts for sorting, joining, filtering and grouping data
  • Extracted data from Teradata database and loaded into Data warehouse using spark
  • Implemented a Continuous Delivery pipeline with Docker and GitHub
  • Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket
  • Performed analysis on the unused user navigation data by loading into HDFS and writing MapReduce jobs
  • Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake
  • Used SQL queries and other tools to perform data analysis and profiling
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Data Engineer

Walgreens Boots Alliance
06.2021 - 05.2022
  • Interacted with clients to gather business and system requirements which involved documentation of processes based on the user requirements
  • Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs
  • Developed Spark jobs on Databricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases
  • Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries
  • Designed number of partitions and replication factor for Kafka topics based on business requirements and worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark)
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala
  • Worked on migrating MapReduce programs into Spark transformations using Scala
  • Developed ETL jobs using PySpark, Data Lineage in which the data has been transformed in multiple stages and actions like aggregations are performed
  • Implemented and maintained multiple ETL processes to synchronize data between different source systems and database
  • Spearheaded the integration of Talend for ETL processes at Walgreens Boots Alliance, optimizing data extraction, transformation, and loading operations, resulting in a significant reduction in data processing time and improved data accuracy
  • Developed Spark applications in Databricks using Pyspark and Spark SQL to perform transformations and aggregations on source data before loading it into Azure Synapse Analytics for reporting
  • Involved on creating multiple kind of Report in Power BI and present it using Story Points
  • Built Ad-Hoc reports in Power BI depending on business requirements was one of the major responsibilities
  • Involved in Creating, Debugging, and Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes
  • Worked on Tableau to build customized interactive reports, worksheets and dashboards
  • Used Tableau to produce the dashboards which will compare the results before using this solution and after using this solution in staging environment
  • Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake
  • Implemented a Continuous Delivery pipeline with Docker and GitHub
  • Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake
  • Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster
  • Worked on designing, building, deploying and maintaining Mongo DB
  • Implemented SQL, PL/SQL stored procedures
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Data Engineer

Intact Insurance
03.2020 - 05.2021
  • Involved in Requirement gathering phase to gather the requirements from the business users to continuously accommodate changing user requirements
  • Developed various spark applications using Scala to perform various enrichment of these click stream data merged with user profile data
  • Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it
  • Developed Simple to complex MapReduce Jobs using Hive and Pig
  • Profile structured, unstructured, and semi-structured data across various sources to identify patterns in data and Implement data quality metrics using necessary query’s or python scripts based on source
  • Worked on PySpark APIs for data transformations
  • Prepared dashboards using Tableau for summarizing Configuration, Quotes, Orders and other e-commerce data
  • Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with Data Frames in Spark
  • Migrated an existing on-premises application to AWS
  • Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
  • Implemented AWS Elastic Container Service (ECS) scheduler to automate application deployment in the cloud using Docker Automation techniques
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed
  • Use SQL queries and other tools to perform data analysis and profiling
  • Followed agile methodology and involved in daily SCRUM meetings, sprint planning, showcases and retrospective.

Data Engineer

Christus Health
01.2018 - 02.2020
  • Interacted with the business analysts to gather the requirements and understanding the functional design specifications for the requirements
  • Developed spark applications for performing large scale transformations and denormalization of relational datasets
  • Developed various spark applications using Scala to perform various enrichment of these click stream data merged with user profile data
  • Worked on developing Pyspark script to encrypting the raw data by using Hashing algorithms concepts on client specified columns
  • Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python
  • Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers
  • Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats
  • Involved in creating, modifying SQL queries, prepared statements and stored procedures used by the application
  • Followed AGILE (SCRUM) methodologies, had sprint planning every two weeks and setup daily meetings to monitor the status
  • Participated in the status meetings and status updating to the management team.

Education

Masters in Computer and Information Sciences -

Southern Arkansas University
Magnolia, AR
12.2023

Bachelor of Technology in Computer Science and Engineering -

Jawaharlal Nehru Technological University
Kakinada, India
04.2019

Skills

Python

  • SQL
  • Scala
  • MATLAB
  • Java
  • Snowflake
  • AWS RDS
  • Teradata
  • Oracle
  • MySQL
  • Microsoft SQL
  • Postgre SQL
  • Data Lakes
  • Talend
  • Informatica
  • MSSIS
  • AWS Glue
  • AWS
  • Azure
  • GCP (Docker, Kubernetes for Containerization)
  • ETL development
  • API Development
  • NoSQL Databases
  • Data Warehousing
  • Scripting Languages
  • SQL Expertise
  • Big Data Processing
  • Data Pipeline Design
  • Spark Framework
  • Data Analysis
  • SQL Programming
  • Data Migration
  • NoSQL Databases
  • Data Warehousing
  • Scripting Languages
  • SQL Expertise
  • Big Data Processing
  • Data Pipeline Design
  • Spark Framework
  • Data Analysis
  • SQL Programming
  • Data Migration

References

Will be provided upon request.

Timeline

Data Engineer

Paychex
12.2022 - Current

Data Engineer

Walgreens Boots Alliance
06.2021 - 05.2022

Data Engineer

Intact Insurance
03.2020 - 05.2021

Data Engineer

Christus Health
01.2018 - 02.2020

Masters in Computer and Information Sciences -

Southern Arkansas University

Bachelor of Technology in Computer Science and Engineering -

Jawaharlal Nehru Technological University
Revanth Kumar