Summary
Overview
Work history
Education
Skills
OBJECTIVE:
Timeline
Generic

YASHWANTH CHOWDARY KUNTA

San Antonio,TX

Summary

  • 10+ Years Experience in Design, build, and maintain scalable data pipelines to support data integration, ETL processes, and data warehousing.
  • Develop and optimize SQL queries to extract, transform, and load data from various sources into datawarehouses.
  • Experience in Data engineering encompassing Requirements Analysis, Design Specification, and Testing in both Waterfall and Agile methodologies.
  • Experienced Data Engineer with expertise in designing and optimizing data pipelines using Microsoft Fabric. Proficient in integrating and managing large-scale datasets across hybrid environments. Skilled in building real-time data solutions, leveraging cloud-based architectures, and ensuring high-performance data processing with Fabric’s robust analytics and storage capabilities.
  • Fluent programming experience with Scala, Java, Python, SQL, T-SQL and Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase.
  • Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, PowerBI and Microsoft SSIS.
  • Proficient in designing and managing REST APIs and integrations using the MuleSoft Any point Platform, with hands-on experience in deploying applications across various environments.
  • Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.
  • Experience working with Azure Logic APP Integration tool and Experience working with Data warehouse like Oracle, SAP, HANA.
  • Experience in building ETL (Azure Data Bricks) data pipelines leveraging PySpark, SparkSQL and Experience in building the Orchestration on Azure Data Factory for scheduling purposes and Batch Processing.
  • Hands-on experience in Azure Analytics Services - Azure Data Lake Store (ADLS), Azure Data Lake Analytics (ADLA), Azure SQL DW, Azure Data Factory (ADF), Azure Data Bricks (ADB) etc.
  • Orchestrated data integration CI/CD pipelines in ADF using various Activities like Get Metadata, Lookup, For Each, Wait, Execute Pipeline, Set Variable, Filter, and programming experience on working with Python, Scala.
  • Experience working in a cross-functional AGILE Scrum team and good knowledge on Poly base external tables in SQLDW and also involved in production support activities
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, AWS Lambda, EMR and other services of the AWS family.
  • Installed, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. and worked on various automation tools like GIT, Terraform, Ansible.
  • Developed web-based applications using Python, DJANGO, QT, C++, XML, CSS3, HTML5, DHTML, JavaScript and jQuery.
  • Proficient at developing sophisticated MapReduce systems that operate on a variety of file types, including Text, Sequence, XML, and JSON. Designed, build and managed ELT data pipelines leveraging Airflow, Python, and GCP solutions.
  • Experienced with JSON based RESTful web services, and XML/QML based SOAP web services and also worked on various applications using python integrated IDEs like Sublime Text and PyCharm.

Overview

11
11
years of professional experience

Work history

Azure Data Engineer

Christus Health
San Antonio, USA
01.2024 - 03.2025
  • Company Overview: Christus Health is a not-for-profit healthcare system based in Irving, Texas, providing high-quality medical services across the U.S., Mexico, and South America
  • Analyzed and developed a modern data solution with Azure PaaS service to enable data visualization
  • Understood the application's current Production state and the impact of new installation on existing business processes
  • Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements
  • Integrated structured and unstructured data from various sources into Microsoft Fabric, facilitating seamless data flow and real-time analytics
  • Designed and implemented data population processes for cloud-based databases, maintaining structured data models and ensuring the availability of clean, accurate data across AWS RDS, Google Cloud SQL, and Azure platforms
  • Experience in using Kafka as a messaging system to implement real-time Streaming solutions using Spark Streaming
  • Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods
  • Involved in requirement gathering, business analysis, and technical design for Hadoop and Big Data projects
  • Developed Databricks ETL pipelines using notebooks, Spark Data frames, SPARK SQL and python scripting
  • Managed and optimized cloud databases (e.g., AWS RDS, Google Cloud SQL) to support scalable data pipelines, ensuring efficient and cost-effective data operations
  • Developed data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (JSON) for visualization, and generating
  • Designed and implemented Infrastructure as code using Terraform, enabling automated provisioning and scaling of cloud resources on Azure
  • Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application
  • Managed large datasets using Panda data frames and SQL
  • Documented data migration procedures, scripts, and data handling processes to provide a reference for troubleshooting and future developments
  • Implemented Synapse Integration with Azure Databricks notebooks which reduce about half of development work and achieved performance improvement on Synapse loading by implementing a dynamic partition switch
  • Implemented Continuous Integration Continuous Delivery (CI/CD) for end-to-end automation of release pipeline using DevOps tools like Jenkins
  • Christus Health is a not-for-profit healthcare system based in Irving, Texas, providing high-quality medical services across the U.S., Mexico, and South America
  • Environment: Jenkins, CI/CD, DevOps, Azure Databricks, Synapse Integration, T-SQL scripting, Panda, Terraform, Spark Scala APIs, Hive, Tableau, SPARK SQL, python scripting, Hadoop, Big Data, Snowflake

AWS Data Engineer

Goldman Sachs
Bengaluru, INDIA
08.2020 - 08.2023
  • Company Overview: Goldman Sachs is a leading global investment banking, securities, and asset management firm expertise in financial advisory, trading, and wealth management, the firm serves corporations, governments, institutions, and individuals worldwide
  • Develop, deploy, and manage scalable data pipelines using AWS services like AWSGlue, AWSLambda, and Amazon Kinesis to handle data ingestion, transformation, and loading
  • Utilize Amazon S3 to build and manage data lakes for storing structured and unstructured data, providing scalable and secure data storage
  • Integrated AWS DynamoDB using AWS Lambda to store the values of items and backup the DynamoDB streams
  • Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop
  • Implemented AJAX, JSON, and Java script to create interactive web screens
  • Responsible for creating Hive tables, loading data, and writing hive queries
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS
  • Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts
  • Good Knowledge on architecture and components of Spark, and efficient in working with Spark Core, Spark SQL, Spark streaming and expertise in building PySpark and Spark-Scala applications for interactive analysis, batch processing and stream processing
  • Provisioned high availability of AWS EC2 instances, migrated legacy systems to AWS, and developed Terraform plugins, modules, and templates for automating AWS infrastructure
  • Worked on SQL and PL/SQL for backend data transactions and validations
  • Set up the CI/CD pipelines using Jenkins, Maven, GitHub, Chef, Terraform, and AWS
  • Created datasets from S3 using AWS Athena and created Visual insights using AWS Quick sight Monitoring Data Quality and integrity end to end testing and reverse engineering and documented existing program and codes
  • Goldman Sachs is a leading global investment banking, securities, and asset management firm expertise in financial advisory, trading, and wealth management, the firm serves corporations, governments, institutions, and individuals worldwide
  • Environment: AWS Athena, S3, SQL, PL/SQL, CI/CD, Jenkins, Maven, GitHub, Chef, Terraform, AWS, AWS EC2, PySpark, Kafka, Hive, Java, Teradata, Sqoop, DynamoDB, AWS Glue, AWS Lambda, Amazon Kinesis

Data Engineer

Lowe’s
Bengaluru, India
01.2017 - 07.2020
  • Company Overview: Lowe’s is a leading home improvement retailer, providing a wide range of products, tools, and services for homeowners, builders, and contractors
  • Part of the Data and reporting team creating insights and Visualization for the business to make decisions on
  • Designed and deployed a Kubernetes-based containerized infrastructure for data processing and analytics, leading to a 20% increase in data processing capacity
  • Written queries in MySQL and Native SQL
  • Well versed with various aspects of ETL processes used in loading and updating Oracle data warehouse
  • Presented the project to faculty and industry experts, showcasing the pipeline's effectiveness in providing real-time insights for marketing and brand management
  • Used Python based GUI components for the Frontend functionality such as selection criteria
  • Used Azure Data factory to ingest data from log files and business custom applications, processed data on Data bricks per day- to-day requirements, and loaded them to Azure Data Lakes
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS
  • Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers
  • Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export
  • Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects
  • Managed relational database services in which the Azure SQL handles reliability, scaling, and maintenance
  • Integrated data storage solutions
  • Build Jenkins jobs for CI/CD Infrastructure for GitHub repos
  • Created Session Beans and controller Servlets for handling HTTP requests from Talend
  • Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders
  • Used Apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operators, Hadoop operators and python callable and branching operators
  • Lowe’s is a leading home improvement retailer, providing a wide range of products, tools, and services for homeowners, builders, and contractors
  • Environment: Azure, Oracle, Kafka, Python, Informatica, SQL Server, Erwin, RDS, NOSQL, Snowflake Schema, MySQL, Bash, Dynamo DB, PostgreSQL, Tableau, Git Hub, Linux/Unix

Data Engineer

GE Aerospace
Bengaluru, India
06.2014 - 12.2016
  • Company Overview: GE Aerospace is a leading global provider of jet engines, components, and integrated systems for commercial and military aircraft the company specializes in advanced propulsion technologies, digital solutions, and sustainable aviation innovations
  • Hands on experience with building data pipelines in python/Pyspark/Hive SQL/Presto and Monitored Data Engines to define data requirements and data Accusations from both relational and non-relational databases including Cassandra, HDFS
  • Created ETL Pipeline using Spark and Hive for ingest data from multiple sources
  • Carried out data transformation and cleansing using SQL queries, Python and Pyspark
  • Expertise knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology to get the job done
  • Worked on building dashboards in Tableau with ODBC connections from different sources like Big Query/ presto SQL engine and developed stored procedures in MS SQL to fetch the data from different servers using FTP and processed these files to update the tables
  • Involved in using SAP and transactions done in SAP - SD Module for handling customers of the client and generating the sales reports
  • Design and configure database, Back-end applications and programs
  • Managed large datasets using Pandas data frames and SQL
  • Worked with AWS Terraform templates in maintaining the infrastructure as code
  • Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, Git, Docker
  • Implemented a continuous delivery (CI/CD) pipeline with Docker for custom application images in the cloud using Jenkins
  • GE Aerospace is a leading global provider of jet engines, components, and integrated systems for commercial and military aircraft the company specializes in advanced propulsion technologies, digital solutions, and sustainable aviation innovations
  • Environment: python, Pyspark, Hive SQL, Presto, Spark, Hive, Cassandra, HDFS, Cassandra, HDFS, Kubernetes Linux, Git, Docker, SAP, Hive SQL, Tableau, AWS Terraform, Pandas

Education

Masters - Information Technology Management

Webster University
San Antonio, Texas

Skills

  • Cloud Technologies: AWS, Azure, MuleSoft, Salesforce, Mule ESB, Design Center, Any point Exchange, Runtime Manager, Any point Studio, API Manager, Any point Monitoring, Amazon S3, EMR, Redshift, Lambda, Athena Composer, Big Query
  • Script Languages: Python, Shell Script (bash, shell)
  • Programming Languages: Java, Python, Hibernate, JDBC, JSON, HTML, CSS, RAML
  • Databases: Oracle, MySQL, SQL Server, PostgreSQL, HBase, Snowflake, Cassandra, MongoDB
  • Version controls and Tools: GIT, Maven, SBT,CBT
  • Web/Application server: Apache Tomcat, WebLogic, WebSphere
  • AWS Ecosystem: S3Bucket, Athena, Glue, EMR, Redshift, Data Lake, AWS Lambda, Kinesis
  • Azure Ecosystem: Azure Data Lake, ADF, Databricks, Azure SQL, Azure Functions
  • Operating Systems: Windows, Unix,Linux
  • IDE Methodologies: Eclipse, Dreamweaver
  • Hadoop Components / Big Data: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos, PySpark Airflow, Kafka, Snowflake Spark Components, Batch Processing
  • Visualization& ETL tools: Tableau, PowerBI, Informatica, Talend
  • Tools: TOAD, SQL developer, Azure Data Studio, Soap UI, SSMS, GitHub, Share Point, Visual Studio, Teradata SQL Assistant
  • ETL/Middleware Tools: Talend, SSIS, Azure Data Factory, Azure Data Bricks, MuleSoft, Microsoft Fabric, data lake management

OBJECTIVE:

A results-driven Data Engineer with 10+ years of experience in data integration, designing, implementing, and maintaining data pipelines and data warehousing solutions. Proven expertise in optimizing data processes, working with large datasets, and leveraging advanced data technologies.

Timeline

Azure Data Engineer

Christus Health
01.2024 - 03.2025

AWS Data Engineer

Goldman Sachs
08.2020 - 08.2023

Data Engineer

Lowe’s
01.2017 - 07.2020

Data Engineer

GE Aerospace
06.2014 - 12.2016

Masters - Information Technology Management

Webster University
YASHWANTH CHOWDARY KUNTA