Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Ravindra Kavutharapu

Chicago,IL

Summary

Over 5+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Data Engineer/Data Developer. Expertise in Business Intelligence, Data warehousing technologies, ETL and Big Data technologies. Highly skilled and detail-oriented Data engineer with a proven track record in designing, developing and maintaining robust data pipelines and systems. Proficient in various programming Languages and frameworks, I am dedicated to ensuring data accuracy, availability and scalability for efficient decision-making and business growth. Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement. Experience in setting up Hadoop clusters on cloud platforms like AWS and GCP. Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Data proc, Pub sub, Airflow. Experience in Designing, building and maintaining the complex data pipelines and systems that Laverage the Apache druid data store with in the GCP ecosystem. Experience in Data Security and Privacy to protect the data from Unauthorized access and to ensure compliance with privacy regulations, implement encryption measures and to establish data management and Governance. Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica PowerCenter. Highly competent Data Engineer with background in designing, testing, and maintaining data management systems. Possess strong skills in database design and data mining, coupled with adeptness at using machine learning to improve business decision making. Previous work resulted in optimizing data retrieval processes and improving system efficiency.

Overview

5
5
years of professional experience
1
1
Certification

Work History

GCP Data Engineer

WELLCARE HEALTH PLANS
Tampa, Florida
02.2024 - Current
  • Developed pipelines for auditing all application metrics using GCP Cloud functions, as well as dataflow for a plot project.
  • Build a program in Python and Apache Beam that will run data validation between raw source files and BigQuery tables in Cloud Dataflow.
  • Developed a Python script to load the CSV files into the S3 buckets and created GCP S3 buckets, performed folder management in each bucket, managed logs and objects within each bucket.
  • Working with Kafka and integrating it with Spark Streaming.
  • In Python, I created Airflow DAGs by importing the Airflow libraries.
  • Snowflake expertise in creating and maintaining tables and views.
  • Participated in weekly release meetings with technology stakeholders to identify and mitigate release-related risks.
  • Processing the data at scale using S3's integrations with numerous AWS services, including AWS Lambda, AWS Glue, and Amazon EMR.
  • To extract, transform, and load (ETL) data from S3 into other services for additional analysis or visualization, use these services.
  • Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Using OpenShift containerization to control sizable data processing and storage systems.
  • We can easily deploy and manage containers across a cluster of machines with the aid of OpenShift, which is advantageous for dealing with growing data loads and processing demands.
  • Experience in writing PL/SQL statements - Stored Procedures, Functions, Triggers, and packages.
  • Maintained and developed Docker images for a tech stack including Cassandra, Kafka, Apache, and several in-house written Java services running in Google Cloud Platform (GCP) on Kubernetes.
  • Environment: GCP, BigQuery, GCS, Spark, Spark-Streaming, Spark SQL, HDFS, Hive, Pig, Apache Kafka, Sqoop, Python, MySQL, Snowflake.
  • Optimized existing queries to improve query performance by creating indexes on tables.
  • Built various dashboards with interactive visualizations using D3.js library.
  • Developed and implemented data models, database designs, data access and table maintenance codes.

Data Engineer

Verizon
Irving, TX
01.2023 - 12.2023
  • Designing and implementing data ingestion pipelines to bring data from various sources into Druid.
  • Defining and create Druid schemas and data segments to optimize data storage and query performance.
  • Implement data transformation processes to preprocess and clean data before it is loaded into Druid.
  • Optimizing Druid queries and indexes for fast and efficient query performance.
  • Integrating Druid with other GCP services or data warehouses for comprehensive data analytics solutions.
  • Continuously analyze and optimize the performance of Druid queries and data loading processes, and identify opportunities to enhance the overall system efficiency.
  • Perform routine maintenance tasks, such as cluster upgrades and data purging.
  • Involved in the production monitoring and identify the solutions.
  • Identifying the failures of the tasks related to the Druid Prod.
  • Implement data backup and disaster recovery plans to ensure data availability and integrity in case of failures.
  • Environment: Druid, GCP, ArgoCD, Kubernetes, Gitlab, Airflow.
  • Optimized existing queries to improve query performance by creating indexes on tables.
  • Built various dashboards with interactive visualizations using D3.js library.
  • Developed and implemented data models, database designs, data access and table maintenance codes.

Cloud Engineer Intern

Lewis University
Chicago, IL
06.2022 - 12.2022
  • Developed and implemented cloud-based solutions for clients using Amazon Web Services.
  • Configured, monitored, and maintained cloud infrastructure such as virtual machines, networks, and storage.
  • Performed system administration tasks to maintain high availability of cloud services.
  • Created automated scripts to deploy applications in the cloud environment.
  • Analyzed existing architectures and developed strategies for migrating workloads from on-premises systems to the cloud.
  • Designed cost-effective solutions that leverage public clouds while meeting customer requirements.
  • Optimized application performance by monitoring serverless computing platforms like AWS Lambda or Azure Functions.
  • Deployed container orchestration tools such as Kubernetes or Docker Swarm on different public clouds.
  • Managed access control policies for users accessing data stored on the Cloud platform.
  • Ensured compliance with industry standards and regulations when deploying applications in the Cloud.
  • Collaborated with architects and engineers to design networks, systems and storage environments that reflected business needs, security specifications, and service level requirements.
  • Selected appropriate AWS service based on compute, data or security requirements.
  • Performed best practices by creating systems with fast load times, multiple browser support, and minimal memory usage.
  • Established backup and retention policies to safeguard critical data in the cloud.
  • Developed and maintained CI/CD pipelines for seamless code deployment to cloud platforms.
  • Documented cloud architectures and processes for knowledge sharing and compliance purposes.
  • Provided technical support and training to teams on cloud technologies and best practices.
  • Engineered disaster recovery strategies in cloud environments, ensuring business continuity.
  • Secured cloud applications and data using IAM policies and encryption methods.
  • Optimized cloud resource utilization and expenditure using cost management tools.
  • Evaluated emerging technology factors around cost comparison, portability or usability.
  • Maintained positive working relationship with fellow staff and management.

Data Engineer

GLOBE ACTIVE TECHNOLOGIES LTD
Hyderabad, India
08.2019 - 05.2021
  • IAM users, groups, roles, identifying providers, defining policies, and applying IAM users and groups were all created
  • Involved in data migration project for multiple applications from on-prem to AWS
  • Partitioning, Bucketing, and Map Side Join were created
  • Parallel execution for optimising hive queries reduced execution time from hours to minutes
  • I used Data Visualisation techniques, which Power BI can use to create visually appealing interactive reports and dashboards that provide insights into the migrated data
  • It enables me to assist decision-makers in comprehending data and making sound decisions
  • Created Hive base script for analyzing requirements and for processing data by designing cluster to handle huge amount of data for cross examining data loaded in Hive and Map Reduce jobs
  • Designing structured and scalable data pipelines for capturing, processing, and storing data generated by supply chain applications
  • Using technologies such as Apache Spark or Apache Kafka to handle large amounts of data, as well as ensuring data quality and optimising pipeline performance, may be required
  • Ensuring Data Integrity and quality within supply chain applications by implementing some data cleansing techniques like Data Validation checks and Data profiling processes
  • Created a PySpark script to set up the data pipeline and was proficient in building ETL streams with Databricks
  • Knowledge of data sourcing and exposing via Rest API
  • Proven ability to write Bash scripts to automate tasks and perform system administration duties
  • Migrated and integrated 250TB of data to the AWS cloud, resulting in a 25% reduction in storage costs
  • Automated the ingestion of 500 million daily records from 16 sources into a unified system, achieving an accuracy rate of 85%
  • Used advanced AWS EMR and Glue for big data infrastructure, allowing for rapid scaling of computation across clusters and a 50% improvement in streaming capabilities
  • Created and deployed S3 buckets and Lambda functions to enable data-driven insights into over 2 million daily events, resulting in a 25% increase in accuracy
  • Set up and deployed Docker, Kubernetes, and Airflow clusters for data processing applications, resulting in a 40% cost savings
  • Created CI/CD pipelines and custom tools to manage 900+ compute jobs in 10% less time than the industry average
  • Collaborated with cross-functional teams to develop data processing and delivery automation solutions using AWS services such as Lambda, S3, and RDS; saved 100+ hours per week and increased delivery speed by 50%
  • Extensive system analysis was performed on legacy systems, identifying inefficiencies that were eliminated by implementing E-R/Dimensional Data Modelling; average query time was reduced by 50%, saving the team 100+ hours per month
  • Used Apache Spark to build custom data ingestion pipelines to process over 200TB of data twice a week, increasing raw ingest speed by 15%
  • Implemented performance tuning techniques by identifying and resolving the bottlenecks in source, target, transformations, mappings and sessions to improve performance
  • Environment: Hadoop, HDFS, Map Reduce, Hive, Oozie, Sqoop, AWS, Apache Spark, Docker, Kubernetes, Databricks, ETL, Big Data, Air flow

Education

Master's Degree - Computer Science, Cloud Computing

LEWIS UNIVERSITY
Chicago, USA
05.2023

Bachelor's Degree - Computer Science and Engineering

VIGNAN'S UNIVERSITY
Guntur, India
06.2019

Skills

  • Python
  • Java
  • Spark
  • Pyspark
  • SQL
  • Pandas
  • Numpy
  • My-SQL
  • PL/SQL
  • No-SQL
  • SQL Server
  • Oracle
  • RDBMS
  • Eclipse
  • Visual Studio Code
  • HTML
  • CSS
  • JavaScript
  • Hadoop testing
  • Hive Testing
  • MRUnit
  • AWS
  • GCP
  • Windows
  • Unix
  • Linux
  • Mac OS
  • Hadoop
  • MapReduce
  • Kafka
  • Hive
  • Oozie
  • Star Schema
  • Snowflake schema
  • Symantec Data Loss Prevention (DLP)
  • Argo CD
  • Kubernetes
  • Docker
  • GIT
  • Yaml
  • Helm
  • Data Migration
  • Database Design
  • SQL Programming

Certification

GCP Certified, LinkedIn

Timeline

GCP Data Engineer

WELLCARE HEALTH PLANS
02.2024 - Current

Data Engineer

Verizon
01.2023 - 12.2023

Cloud Engineer Intern

Lewis University
06.2022 - 12.2022

Data Engineer

GLOBE ACTIVE TECHNOLOGIES LTD
08.2019 - 05.2021

Master's Degree - Computer Science, Cloud Computing

LEWIS UNIVERSITY

Bachelor's Degree - Computer Science and Engineering

VIGNAN'S UNIVERSITY
Ravindra Kavutharapu