Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Timeline
Generic

Sumanth Obilineni

Ashburn,VA

Summary

Data Engineer with over 7 years of experience in designing and developing ETL pipelines using Databricks. Proven expertise in big data ecosystems, encompassing data aggregation, querying, storage, and analysis. Achieved a 30% reduction in release time through optimized CI/CD pipelines and proficient in Azure Cloud services and DevOps practices. Strong analytical skills complemented by a solid understanding of database technologies such as Spark, Hive, and Kafka for real-time and batch processing.

Overview

8
8
years of professional experience

Work History

Data Engineer

IQVIA
Philadelphia, PA
03.2024 - Current
  • Company Overview: MedTech | Philadelphia, PA
  • Involved in creating architecture for pipelines, with steps including getting supplier data from different SFTP locations, QC data, transforming the data to a common format, running analysis, and providing insights.
  • Worked on creating automated pipelines in Airflow to load the data from SFTP to Azure Blob, and create tables in medallion architecture in Databricks.
  • Responsible for creating a custom SFTP sensor in Airflow to start the pipelines when files are placed in SFTP.
  • Created a QC framework on partner data to check outliers, required columns, and column fill percentages.
  • The QC framework is also responsible for finding missing data, creating alerts if outliers are found, by using data trends.
  • Involved in migrating hospital claims data from HDFS to Azure Databricks.
  • Reduced the time taken to run the jobs by 80% using Spark and cloud practices.
  • Created pipelines to Airflow to periodically distcp history and delta data to Azure Blob.
  • Worked on creating an email, and logging utilities to send alerts and log messages to Azure Log Analytics, respectively.
  • Managed metadata structures needed for building reusable and generic ETL components using ADF, Databricks Jobs.
  • Launched Databricks workspaces using Terraform.
  • Automated tasks to test notebooks, or run Spark jobs on Databricks clusters using Azure DevOps and GitHub Actions.
  • Created Dev, Test, and Prod environments for different stages of development.
  • Each environment can have different configurations, such as cluster sizes, libraries, and jobs.
  • Developed log monitoring jobs for the performance of Spark jobs, and alerted teams in case of failures and exceeding task thresholds.
  • Created pipelines in GitLab for dev, test, and prod environments to create workflows.
  • MedTech | Philadelphia, PA
  • Environment: Databricks, SparkSQL, PySpark, Python, Azure, SFTP, Airflow, JIRA, GitLab, Maven, Azure DevOps.

Data / DevOps Engineer

MetiStream
Ember, VA
09.2022 - 02.2024
  • Company Overview: Ember, Virginia.
  • Developed a solution for the Ember Platform, decreasing the delay in data availability by 80%, and boosting data availability by 100%.
  • Developed CloudFormation templates to orchestrate AWS EKS and EMR.
  • Developed scripts to automate and push Docker images to ECR.
  • Created Helm charts to pull and use the Docker images in EKS.
  • Services are exposed using the AWS Load Balancer with SSL termination.
  • Created CI/CD pipelines using Jenkins, including creating Docker images to use in EKS.
  • Performed impact analysis, performance tuning, and capacity planning for the enterprise data warehouse, and its infrastructure source systems are added, and new integration business rules and logic are introduced.
  • Developed APIs to connect to Elasticsearch, and built ES DSL queries for Ember Dashboards.
  • Developed APIs to connect MongoDB to store metadata.
  • Implemented many generic or reusable components, logging wrapper, generic exception handling, etc.
  • Testing and validating various integrations to the cloud (AWS) and big data distributions (Cloudera).
  • Architecture discussions, proposing new ideas to enhance the Ember functionality.
  • Code reviews on PRs and validation of test cases.
  • Configured pipelines to alert the stakeholders, and production operations send alerts using Nagios.
  • Environment: Python, Spark SQL, Java, SQL Server, PySpark, AWS, EKS, Cloudera, ElasticSearch, ElasticSearch DSL, Git, Maven, Nagios, and JIRA.

DevOps Engineer

Fidelity Investments
Raleigh, NC
10.2020 - 08.2022
  • Company Overview: Raleigh, NC.
  • Performed software configuration and release management activities for three different Java applications.
  • Designed and implemented Continuous Integration (CI) processes and tools, with approvals from development and other affected teams.
  • Defined processes to build and deliver software baselines for internal and external customers.
  • Coordinated with Anthill consultants to resolve licensing, technical, and ongoing issues, including Anthill patching, and application-related needs.
  • Collaborated with web administrators to set up automated deployment for SharePoint applications using Anthill and SVN tools.
  • Executed build operations using ANT scripts, modifying them as per project requirements.
  • Created and managed metadata types such as Branch, Label, Trigger, and Hyperlink; supported developers in creating config specs, and managed the merge process for project-specific branches.
  • I took ownership of the release branch, implementing triggers to enforce development policies and invoke operations before or after critical ClearCase events using PERL scripts.
  • Designed release plans in coordination with stakeholders, including project management, development leads, QA teams, and ClearCase administrators.
  • Worked on cross-platform environments (Windows NT and Linux) to ensure a thorough understanding and functionality of ClearCase.
  • Coordinated Change Control Board (CCB) meetings to discuss defects and enhancements, generating detailed reports to resolve issues before subsequent builds and testing.
  • Built version-controlled Java code on ClearCase Unified Change Management (UCM) project-based code streams, utilizing Visual Build Pro (VBP), and ANT scripts for VGS’ partners.
  • Environment: ClearCase, SVN, Shell, ANT, Hudson, JIRA, Linux, Windows, JBoss, Subversion, Visual Basic 6.0, Visual SourceSafe 6.0, SQL Server, PERL, Cruise Control, Git, Maven, JIRA.

Data Engineer

KPIT Technologies
Pune, India
01.2019 - 07.2020
  • Company Overview: Pune, India.
  • Designed and developed the real-time matching solution for customer data ingestion.
  • Worked on converting the multiple SQL Server and Oracle stored procedures into Hadoop using Spark SQL, Hive, Scala, and Java.
  • Created a production data lake that can handle transactional processing operations using the Hadoop ecosystem.
  • Developed PySpark and Spark SQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations.
  • Involved in validating and cleansing the data using Pig statements, and hands-on experience in developing Pig macros.
  • Worked with Hadoop, Big Data Integration, and ETL on performing data extraction, loading, and transformation processes for ERP data.
  • Performed extensive exploratory data analysis using Teradata to improve the quality of the dataset, and created data.
  • Experienced in various Python libraries, like Pandas, one-dimensional NumPy, and two-dimensional NumPy.
  • Developed data visualizations in Power BI to display the day-to-day accuracy of the model with newly incoming data.
  • Utilized Jira as a project management methodology, and Git for version control to build the program.
  • Reported and displayed the analysis results in the web browser with HTML and JavaScript.
  • Involved constructively with project teams, supported the project's goals through principles, and delivered insights for the team and client.
  • Environment: Hadoop, Python, Spark SQL, Hive, Java, SQL Server, PySpark, Tableau, Git, Maven, Power BI, JIRA.

SQL Developer

iAppSoft
Hyderabad, India
05.2017 - 12.2018
  • Company Overview: SDTM.| Hyderabad, India
  • Standardized and quality-checked clinical trial data (greater than 2L table with different schemas) using clustering, unifying, transforming, analytical, and statistical techniques.
  • Automated this process of analyzing various tables based on the domain (functional area).
  • Requirements for building in-house tools for the curation process.
  • Understanding SDTM and other clinical data models, curation processes, ETL tools, etc.
  • To better understand and quickly develop new techniques that suit and accelerate our process.
  • Performance and quality testing of the standardization process was performed using an in-house developed application.
  • Analyze various studies, stage them based on set business rules, and identify new business rules for the same.
  • Profile all the tables available in all the servers and databases periodically.
  • Compare the previous month's profiles with the current month's profiles to determine unused, redundant databases, and tables to free up space and form a structured database.
  • Environment: Oracle, SQL Developer, Windows.

Education

Master of science - Information Systems

Indiana Tech University
Fort Wayne, Indiana, United States

Bachelor of Science - Computer Science and Engineering

PRIST University
Thanjavur, Tamilnadu, India

Skills

  • PostgresSQL
  • Oracle
  • MySQL
  • MongoDB
  • Java
  • Python
  • SQL
  • Shell Scripting
  • Airflow
  • Dataflow
  • StreamSets
  • Databricks
  • Cloudera
  • Spark
  • Hadoop
  • PySpark
  • SparkSQL
  • DeltaLake
  • Kafka
  • Docker
  • Kubernetes
  • EKS
  • Azure
  • AWS
  • Jenkins
  • Ansible
  • Terraform
  • PyCharm
  • IntelliJ
  • Eclipse
  • SQL Developer
  • Power BI
  • JIRA

Accomplishments

  • Led successful migrations of on-prem workloads to AWS and Azure, improving scalability and performance.
  • Designed and implemented disaster recovery strategies, ensuring minimal downtime during critical incidents.
  • Mentored junior engineers, fostering a culture of knowledge sharing and professional growth.

Timeline

Data Engineer

IQVIA
03.2024 - Current

Data / DevOps Engineer

MetiStream
09.2022 - 02.2024

DevOps Engineer

Fidelity Investments
10.2020 - 08.2022

Data Engineer

KPIT Technologies
01.2019 - 07.2020

SQL Developer

iAppSoft
05.2017 - 12.2018

Master of science - Information Systems

Indiana Tech University

Bachelor of Science - Computer Science and Engineering

PRIST University
Sumanth Obilineni