Summary
Overview
Work History
Education
Skills
Websites
Certification
Activities
Awards
Timeline
Generic

Chandu Somepalli

Arbutus,MD

Summary

A self - motivated graduated student with a Masters degree in Data Science from UMBC with an industry experience as a Software Engineer. Specialized in increasing efficiency, accuracy, and utility of internal data processing. Vast exposure in creating data regression models, using predictive data modeling, and analyzing data mining algorithms to deliver insights and implement action-oriented solutions to complex business problems.

Overview

2
2
years of professional experience
1
1
Certification

Work History

Data Engineer

Artificial Inventions LLC
04.2022 - 09.2024

• Involved in the Design and Architecture of the complete application using PySpark and Snowflake.

• Developed the main module data processing using PySpark, integrating it with other modules stored in Snowflake.

• Tested and delivered the application with high consistency and reliability, ensuring efficient data pipelines and transformation processes.

• Enhanced the application based on market changes and handled high-volume production data with optimized PySpark operations and Snowflake performance tuning.

• Created Databricks Workflow to automate PySpark jobs and Snowflake queries.

• Designed and maintained metadata for Snowflake tables and schemas to support data lineage and governance.

• Analyzed and transformed data using PySpark and Snowflake SQL managed and monitored Spark job logs and Snowflake query history for performance insights and troubleshooting.

Data Migration Lead – Hive to BigQuery Migration Project:

• Led the migration of data processing scripts from Hive HQL to BigQuery SQL, ensuring compatibility with GCP’s serverless environment.

• Analyzed existing Hive scripts to understand data transformation logic, identifying any Hive-specific functions and converting them to equivalent BigQuery SQL functions.

• Developed and optimized BigQuery SQL queries to replicate Hive transformations while improving performance and leveraging BigQuery’s features, such as partitioning and clustering.

• Utilized Dataflow for migrating data pipelines and transforming data, allowing a seamless transition from Hive to BigQuery.

• Created Airflow DAGs to schedule and automate data loading jobs from GCS to BigQuery, ensuring smooth data processing workflows.

• Ensured data quality and integrity by implementing testing frameworks to compare output data between Hive and BigQuery environments.

• Documented the end-to-end migration process, including best practices, limitations, and optimizations, to support ongoing maintenance and scalability of BigQuery operations.

• Collaborated with cross-functional teams to train team members on BigQuery, facilitating knowledge transfer and adapting to the GCP environment.

Education

Master of Professional Studies - Data Science

University of Maryland, Baltimore County
Baltimore, MD
12-2021

Bachelor of Technology - Electrical, Electronics And Communications Engineering

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
CHENNAI
05-2019

Skills

  • Python
  • Numpy
  • Pandas
  • Scikit-Learn
  • Hadoop
  • PySpark
  • MySQL
  • SQL
  • Tableau
  • MS - Excel

Certification

  • NVIDIA DLI Certificate - Fundamentals of Deep Learning for Computer Vision, 05/01/20, 909810fe630b41f1a054b4256eabc0ec
  • Certified Microsoft Office Specialist, 03/01/15, C9bo-DTez

Activities

  • Technical Fest Organizer (Technolites 2k16), Event Organizer, 01/01/16
  • Zonal programming contest, IIT BOMBAY, 01/01/15

Awards

  • Aquamarine Award, Optum Global Solutions - Enterprise Monitoring, 12/01/18
  • C Programming quiz Winner, Computer Engineers Technical Association, 01/01/15

Timeline

Data Engineer

Artificial Inventions LLC
04.2022 - 09.2024

Master of Professional Studies - Data Science

University of Maryland, Baltimore County

Bachelor of Technology - Electrical, Electronics And Communications Engineering

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
Chandu Somepalli