Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Jeshwanth Reddy Borra

Jeshwanth Reddy Borra

Eden Prairie,MN

Summary

Over As an experienced Data Engineer with 6 years of expertise in designing, building, and optimizing data pipelines and architectures across AWS, Azure, and GCP, I seek to apply my skills in cloud-based data engineering to drive data-driven decision-making.

Results-focused data professional equipped for impactful contributions. Expertise in designing, building, and optimizing complex data pipelines and ETL processes. Strong in SQL, Python, and cloud platforms, ensuring seamless data integration and robust data solutions. Known for excelling in collaborative environments, adapting swiftly to evolving needs, and driving team success.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Azure Data Engineer II

CHS Inc.
07.2024 - Current
  • Company Overview: CHS Inc. is a leading global agribusiness owned by farmers, ranchers and cooperatives. I implement data integration solutions to unify data across cloud platforms. Use transformation tools (e.g., dbt, Apache Spark) for data cleansing, aggregation, and transformation. Automate data workflows with orchestration tools like Apache Airflow or cloud-native services
  • Created Data stage and ETL jobs for populating the data into Data Warehouse constantly from different source systems like ODS, flat files, Parquet. Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Assess the infrastructure needs for each application and deploy it on Azure platform.
  • Actively involved in designing and developing data ingestion, aggregation, and integration in the Hadoop environment.
  • Integrated Kafka with Spark Streaming for real time data processing. Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Worked on Spark to improve the performance and optimization of the existing algorithms in data bricks using Spark context, Spark SQL, Data Frame, pair RDD's. Processed the schema oriented and non-schema-oriented data using Scala and Spark. Integrated Azure Data Factory with Blob Storage to move data through Databricks for processing and then to Azure Data Lake Storage and Azure SQL data warehouse. Developed ETL pipelines between data warehouses using a combination of Python and Snowflake, SnowSQL, writing SQL queries against Snowflake.
  • Developed Python scripts for parsing JSON documents and loading data into databases.
  • Implemented batch processing of streaming data using Spark Streaming. Developed Apache Spark data processing projects to handle data from various RDBMS and streaming sources, utilizing Scala and Python.
  • Performed data processing in Azure Databricks after data ingestion into Azure services such as Azure Data Lake, Azure Storage, Azure SQL DB, and Azure SQL DW.
  • Conducted a Proof of Concept to devise cloud strategies leveraging Azure for optimal utilization.
  • Demonstrated expertise in utilizing various Azure Cloud Services, including PaaS and IaaS, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake. Developed tabular models on Azure Analysis Services to meet business reporting requirements.
  • Proficient in working with Azure BLOB and DataLake storage and loading data into Azure SQL Synapse Analytics.
  • Built Spark programs using Scala and APIs for performing transformations and operations on RDDs.
  • Utilized Scala and Spark-SQL / Streaming to create Spark code for accelerated data processing.
  • Generated capacity planning reports using Python packages like NumPy and Matplotlib.
  • Employed Hadoop scripts to manipulate and load data from the Hadoop File System.
  • Contributed to the design and development of Snap logic pipelines for data extraction from the DataLake to the staging server, followed by data processing using Informatica for ingestion into Teradata within the EDW.
  • Addressed Teradata utility failures and handled errors related to Snap Logic, Informatica, and Teradata by implementing necessary code overrides. Transferred data from Azure Blob storage to Snowflake database.
  • Proficient in using Snowflake utilities, Snow SQL, Snow Pipe, and applying Big Data modeling techniques using Python.
  • Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records.
  • Conducted Data blending, Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server. Used Docker for managing the application environments.
  • Utilized Elasticsearch and Kibana for indexing and visualizing the real-time analytics results, enabling stakeholders to gain actionable insights quickly. Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application.
  • Demonstrated skill in parameterizing dynamic SQL to prevent SQL injection vulnerabilities and ensure data security.
  • Created and triggered automated builds and Continuous Deployments (CI/CD) using Jenkins/looper and OneOps cloud.
  • Migrated Oracle E-Business Suite to Google Cloud SQL with
  • Designed and implemented end-to-end data pipelines in Azure Synapse Analytics using dedicated SQL pools and serverless queries. Conducted query optimization and performance tuning tasks, such as query profiling, indexing, and utilizing Snowflake's automatic clustering to improve query response times and reduce costs.
  • Tools Used: Analytics, API, Azure, Azure Data Lake, Azure Synapse Analytics, Blob, CI/CD, Data Factory, Docker, Elasticsearch, ETL, Factory, Hive, Jenkins, Kafka, Oracle, PySpark, Scala, Snowflake, Spark, Spark SQL, Spark Streaming, SQL, Tableau

AWS Data Engineer II

U.S. Bank
11.2023 - 06.2024
  • Company Overview: U.S. Bancorp (is an American multinational financial service. I built scalable and automated ETL pipelines using AWS Glue and Step Functions. Utilized the AWS Lambda for event-driven data transformations. Implemented data lakes on Amazon S3, optimizing for cost and performance. Worked with Redshift for warehousing and query optimization.
  • Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using Flume and Sqoop, and performed structural modifications using MapReduce, Hive. Worked on functions in Lambda that aggregates the data from incoming events, and then stored result data in Amazon Dynamo DB.
  • Used Hive to analyze data ingested into HBase by using Hive, HBase integration and compute various metrics for reporting on the dashboard. Analyzed data using Hadoop components Hive and Pig.
  • Worked with Docker containers in developing the images and hosting them in an artifactory.
  • Speareheaded HBase setup and utilized Spark and Spark SQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy. Provisioned high availability of AWS EC2 instances, migrated legacy systems to AWS, and developed Terraform plugins, modules, and templates for automating AWS infrastructure.
  • Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support. Develop metrics based on SAS scripts on legacy system, migrating metrics to Snowflake (AWS S3).
  • Design and configure database, Back-end applications and programs. Managed large datasets using Pandas data frames and SQL. Build Jenkins jobs for CI/CD Infrastructure for GitHub repos. Implemented automated Data pipelines for Data migration, ensuring a smooth and reliable transition to the Cloud environment.
  • Experience in creating Kubernetes replication controllers, Clusters and label services to deployed Microservices in Docker. Designed and deployed multi-tier applications using all AWS services (EC2, Route53, S3, RDS, DynamoDB, SNS, SQS, IAM) with an emphasis on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
  • Built robust data ingestion pipelines using Logstash, Filebeat, and Kafka Connect to stream real-time logs and events into Elasticsearch clusters. Use tools like AWS Glue, AWS Lambda, or Apache Spark for large-scale data processing and transformation. Develop ETL/ETL processes to prepare data for analytics and reporting.
  • Ensure data security by implementing encryption, access controls, and monitoring using AWS IAM, AWS KMS, and AWS CloudTrail. Work with IoT data streams from medical devices and sensors.
  • Work closely with data scientists to provide clean, structured data for machine learning models and advanced analytics.
  • Support business analysts by enabling access to data through tools like Amazon QuickSight or Tableau.
  • Use AWS CloudWatch, AWS X-Ray, and AWS Cost Explorer to troubleshoot issues and optimize resource usage.
  • Manage and optimize AWS cloud infrastructure, including EC2 instances, S3 buckets, and networking components.
  • Use Infrastructure as Code (IaC) tools like AWS CloudFormation or Terraform for deployment and management.
  • Use AWS IoT Core, AWS Kinesis, or Apache Kafka for real-time data processing and analytics.
  • Tools Used: AWS, CI/CD, Cluster, Docker, DynamoDB, EC2, Elasticsearch, ETL, Flume, Git, HBase, HDFS, Hive, Jenkins, Kafka, Kubernetes, lake, Lambda, MapR, Pandas, Pig, RDS, S3, SAS, Snowflake, Spark, Spark SQL, SQL, Sqoop

GCP Data Engineer II

CRISIL Limited
04.2020 - 07.2023
  • Company Overview: CRISIL Limited is a global analytics company that provides ratings, data, research, analytics, and solutions to the financial services industry. Designed and implemented the efficient data models for BigQuery to support analytics and reporting needs. Optimized schemas for performance, scalability, and cost-efficiency.
  • Managed, Configured and scheduled resources across the cluster using Azure Kubernetes Service.
  • Ensured data quality and report accuracy by implementing validation scripts and schema checks in the pipeline feeding Data Studio. Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on NoSQL Databases such as HBase and integrated with PySpark for processing and persisting real-time streaming. Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
  • Achieved 70% faster EMR cluster launch and configuration, optimized Hadoop job processing by 60%, improved system stability, and utilized Boto3 for seamless file writing to S3 bucket.
  • Created Amazon VPC to create public-facing subnet for web servers with internet access, and backend databases & application servers in a private-facing subnet with no Internet access.
  • Developed an end-to-end solution that involved ingesting sales data from multiple sources, transforming and aggregating it using Azure Databricks, and visualizing insights through Tableau dashboards
  • Built and debugged Python and Bash scripts in Cloud Shell, accelerating ETL pipeline development and reducing environment configuration time by 60%. Managed large datasets using Panda data frames and SQL.
  • Created batch and real time pipelines using Spark as the main processing framework.
  • Involved in building database Model, APIs and Views utilizing Python, in order to build interactive web-based solutions.
  • Automated deployment and management of GCP resources using Google Cloud SDK, streamlining infrastructure provisioning for data pipelines across Dataproc, BigQuery, and GCS. Deployed Azure Functions and other dependencies into Azure to automate Azure Data Factory pipelines for Data Lake jobs.
  • Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities.
  • Designed and optimized complex SQL queries and stored procedures for PostgreSQL and MySQL to support real-time analytics and reporting, improving query performance by 45%. Working with GCP cloud using in GCP Cloud storage, Data-proc, Data Flow, Big-Query, EMR, S3, Glacier and EC2 with EMR Cluster
  • Build a program with Python and Apache Beam and execute it in Cloud Dataflow to run Data validation between raw source file and BigQuery tables. Migrated previously written cron jobs to Airflow/Cloud Composer in GCP.
  • Tools Used: Airflow, Apache, Apache Beam, API, APIs, Azure, BigQuery, Cluster, Data Factory, EC2, EMR, ETL, Factory, GCP, HBase, Kubernetes, Lake, MySQL, PostgreSQL, PySpark, Python, S3, SDK, Spark, SQL, Tableau, VPC

Data Engineer II

Merck Group
06.2019 - 03.2020
  • Company Overview: Merck is a global healthcare company that develops and sells innovative health solutions for people and animals. I designed, built, and maintained the robust and scalable data pipelines for processing large volumes of data. Ensured data flows efficiently between different systems and applications using ETL (Extract, Transform, Load) processes. Designed and optimized database structures to store and retrieve data efficiently.
  • Designed and developed ETL pipelines for real-time data integration and transformation using Kubernetes and Docker.
  • Developed Sqoop scripts to migrate data from Oracle to Big data Environment. Effectively scheduled and managed jobs on Azure virtual machines using Control-M, optimizing resource allocation and ensuring reliable execution.
  • Experience in using Kafka as a messaging system to implement real-time Streaming solutions using Spark Streaming
  • Developing data pipelines and workflows using Azure Databricks to process and transform large volumes of data, utilizing programming languages such as Python, Scala, or SQL.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.
  • Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package. Involved in monitoring and scheduling the pipelines using Triggers in Azure Data Factory.
  • Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Working on data management disciplines including data integration, modeling and other areas directly relevant to business intelligence/business analytics development.
  • Worked on scheduling all jobs using Airflow scripts using Python added different tasks to DAG, LAMBDA.
  • Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash. Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions.
  • Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (Google Cloud).
  • Tools Used: Azure, CI/CD, Docker, DynamoDB, EC2, Elasticsearch, ETL, Factory, Git, Jenkins, Kafka, Kubernetes, lake, MySQL, Oracle, Python, S3, SAS, Scala, Services, Spark, Spark SQL, Spark Streaming, SQL, Sqoop

Education

Masters - Information Technology

Concordia University
St. Paul, Minnesota
12.2024

Skills

  • AWS Ecosystem: S3Bucket, Athena, Glue, EMR, Redshift, Data Lake, AWS Lambda, Kinesis
  • Azure Ecosystem: Azure DataLake, ADF, Databricks, Azure SQL
  • Google Cloud Platform: GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub, Dataflow
  • Programming and Scripting: Python, Pyspark, SQL, HiveQL, Shell Scripting, Scala, HTML
  • Big Data Ecosystem: HDFS, Nifi, Map Reduce, Oozie, Hive/Impala, Pig, Sqoop, Zookeeper and HBase, Spark, Scala, Kafka, Apache Flink, Yarn, Cassandra, Cloudera, Horton works
  • Database Management: Snowflake, Teradata, Redshift, MySQL, SQLServer, Oracle, Dynamo DB
  • Operating Systems: Windows, Unix, Linux
  • Reporting and ETL Tools: Tableau, PowerBI, Informatica, Airflow
  • ETL development
  • Data pipeline design
  • Data modeling
  • Data warehousing
  • SQL expertise
  • Hadoop ecosystem

Certification

  • Licensed AWS Data Engineer Associate
  • Licensed Azure Data Fundamentals
  • Licensed Google Cloud Certified Associate Cloud Engineer

Timeline

Azure Data Engineer II

CHS Inc.
07.2024 - Current

AWS Data Engineer II

U.S. Bank
11.2023 - 06.2024

GCP Data Engineer II

CRISIL Limited
04.2020 - 07.2023

Data Engineer II

Merck Group
06.2019 - 03.2020

Masters - Information Technology

Concordia University