Summary

Overview

Work History

Education

Skills

Certification

Timeline

SANTOSH REDDY KONNI

Senior Data Engineer

Summary

Experienced, result-oriented, resourceful and problem-solving Data Engineer with 8 years of diverse experience in Information Technology field, includes Development and Implementation of various applications in Cloud environments in Storage, Querying and Processing data.
Responsive expert experienced in monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.

Overview

years of professional experience

Certification

Work History

Senior Data Engineer

Verana Health

Charlotte, NC

10.2021 - Current

Led the design and implementation of end-to-end scalable, secure cloud architecture data pipeline using Databricks with AWS resulting in a 40% improvement in data processing efficiency.
Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions that align with healthcare data compliance standards.
Led the integration of diverse data sources into Databricks, utilizing Unity Catalog to manage metadata and access controls, ensuring seamless data availability for analytics and reporting across the organization.
Automated ETL workflows using Apache Airflow and Databricks, resulting in a 25% improvement in data availability and freshness for analytics.
Conducted performance tuning and optimization of Databricks clusters to enhance data processing speed and efficiency.
Crafted Spark generic User-Defined Functions (UDFs) for executing record-level business logic operations.
Formulated an Event-driven pipeline leveraging SNS, SQS, AWS Lambda, Glue, and AWS Step Functions.
Orchestrated seamless integration of Databricks notebooks and jobs into CI/CD pipelines for automated testing and deployment.
Implemented dependency management strategies within Airflow to handle task dependencies effectively, ensuring a coherent and orderly execution sequence for Databricks jobs.

Senior Data Engineer

Nike

Beaverton, OR

04.2020 - 10.2021

Worked with APLA Enterprise Data Analytics team on data ingestion, transformation, and consumption views for Nike Direct forecast across the APLA (Asia Pacific Latin America) data built on AWS Cloud using EMR, S3, Lambda, Pyspark, Hive, Airflow and Snowflake.
Integrated various source systems and analyzed data to support pre channel and post channel sales transformation.
Engineered a robust data pipeline for ingestion, aggregation, and loading of consumer response data into Hive external tables on AWS S3, subsequently publishing the data to Snowflake for Tableau dashboard data sources.
Demonstrated proficiency in consuming near real-time data using Spark Streaming in conjunction with Kafka as a high-throughput data pipeline system.
Applied advanced skills in handling JSON datasets, crafting custom Python functions for parsing through JSON data using Spark.
Innovated by constructing an AWS Lambda function with boto3, automating the de-registration of unused AMIs across all application regions to optimize EC2 resource costs.
Showcased expertise in creating, debugging, scheduling, and monitoring jobs using Airflow for ETL batch processing, ensuring seamless loading into Snowflake for analytical processes.
Mastered the orchestration of job schedules through Airflow, including the creation of custom operators for enhanced flexibility and efficiency.

Data Engineer

Aetna

Hartford, CT

05.2018 - 04.2020

Worked with data stewardship and analytics teams to migrate an existing on-prem data pipelines to GCP using cloud native tools such as GCS Bucket, G - Cloud functions, Cloud dataflow, Pub/Sub cloud shell, gsutil , BQ command line utilities, Cloud Composer, Pyspark, Python, Dataproc and Big Query.
Expertise in handling large datasets using Partitions, Spark in Memory capabilities, efficient Joins, Transformations and other during ingestion process itself.
Developed generic Python script for data validation between source file and BigQuery tables along with maintains of archival process in GCS bucket.
Hands on experience in building and architecting multiple data pipelines, end to end ETL and ELT process for Data ingestion, transformation in GCP.
Created BigQuery authorized views for row level security or exposing the data to other teams.
Worked with cloud composer the run end to end data pipeline to schedule jobs and dependencies.

Big Data Engineer

Bank of America

Charlotte, NC

02.2016 - 05.2018

Worked with Wholesale loss Forecasting team that comes under Global Risk Analytics platform of BOA to support CCAR (Comprehensive Capital Analysis & Review) cycles which is a part of Federal regulation.
Developed Pyspark scripts to reduce the costs of organization by 30% and migrating the legacy systems from Teradata, Oracle to build Data Lake in Hadoop.
Loading the data from the different Data sources like (Teradata and Oracle) into HDFS using Sqoop and load into Hive tables, which are partitioned.
Integrated existing code logic in HiveQL to SparkSQL applications for data transformation and aggregation and write it to hive table.
Implemented dimensional Data Modeling to deliver Multi-Dimensional STAR, Snowflake schemas by normalizing the dimension tables as appropriate in Data Lake.
Extensively worked on Impala for querying hive tables for low latency and given to end users.
Developed application specific common utilities in Python to perform Data Quality (DQ) checks on data before being used by subsequent process or published to downstream users.
Developed oozie workflows and sub workflows to orchestrate the Sqoop scripts, hive queries, Spark scripts to automate the ETL process.

Education

Master of Science - Computer Science

Monmouth University

West Long Branch, NJ

12.2015

Bachelor of Technology - Computer Science And Engineering

Jawaharlal Nehru Technological University

Hyderabad, India

05.2013

Skills

AWS Cloud Technologies - AWS S3, EMR, Athena, AWS Glue, Redshift Spectrum, Redshift, Lambda , Step Functions, CloudWatch, CloudTrail, SNS, SQS, IAM, DMS, DynamoDB, Snowflake and Microservices
GCP Cloud Technologies - GCS, Dataproc, BigQuery, Dataflow, gsutil, Cloud Functions, Cloud Composer, Pub/Sub, Cloud SQL and Cloud Monitoring
Big Data Components - Databricks, Hadoop, MapReduce, YARN, Hive, Sqoop, Oozie, Kafka, Impala, Hue, HBase and Spark (Core, Spark SQL and Streaming)
Databases - Oracle, Teradata, SQL-Server and My SQL
Programming Languages - Python, Scala, Core Java, SQL and Shell Scripting

Data Visualization - Tableau and Trifacta
Orchestration - Airflow, Oozie and Autosys
Tools - Docker, Kubernetes, Maven, Ansible, Jenkins, JIRA, Git hub and Bit Bucket
Data Modeling & Warehousing: Expertise in designing dimensional data models, data warehousing solutions, and implementing ETL/ELT pipelines

Certification

AWS Certified Developer – Associate

Timeline

Senior Data Engineer

Verana Health

10.2021 - Current

Senior Data Engineer

Nike

04.2020 - 10.2021

Data Engineer

Aetna

05.2018 - 04.2020

Big Data Engineer

Bank of America

02.2016 - 05.2018

Master of Science - Computer Science

Monmouth University

Bachelor of Technology - Computer Science And Engineering

Jawaharlal Nehru Technological University

SANTOSH REDDY KONNI

Summary

Overview

Work History

Senior Data Engineer

Senior Data Engineer

Data Engineer

Big Data Engineer

Education

Master of Science - Computer Science

Bachelor of Technology - Computer Science And Engineering

Skills

Certification

Timeline

Senior Data Engineer

Senior Data Engineer

Data Engineer

Big Data Engineer

Master of Science - Computer Science

Bachelor of Technology - Computer Science And Engineering

Similar Profiles

Krista BerridgeKrista Berridge

Priya KothaPriya Kotha

Nancy Ip-DialNancy Ip-Dial

Megan FriesMegan Fries

Chih Han YuChih Han Yu