Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Aditya Nuthalapati

Durham,NC

Summary

Experienced, result-oriented, resourceful and problem-solving Data Engineer with 8 years of diverse experience in Information Technology field, includes Development, and Implementation of various applications in Big Data and Cloud environments in Storage, Querying and Processing.

Overview

11
11
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

IQVIA
09.2019 - Current

We are working on the data migration for patients with American heart association and American college of surgeons. The Main aim of the Project is to Migrate the Legacy data from Traditional databases to Cloud GCP and Distributed environments. When we are doing this effort lot of applications got merged into Modern tools like Teradata system to Snowflake database and also using advanced schema registry called Nebula to capture the nature of the patient data. Oracle is the Data-ware house on top of GCP where we can perform any kind of Ad hoc queries on data with in-memory techniques with very less latency even petabytes we can spin in secs. On the other hand, we implemented lambda jobs to push file from one source to Other without manual effort and having great expertise on IAM roles creation and config changes between the VPC. Provisioned Dev Hadoop cluster on Google Compute Instances and Installed all the client’s Through Ambari and attached EBS volumes to make enough resource for the Developing Team!

Responsibilities:

  • Migrating an entire Oracle database to Big Query using Change Data Capture. Real time collection of change data from Oracle Database and Real time delivery of change data to Google Big Query.
  • Developed Big Query queries to extract and deliver meaningful insights to stakeholders.
  • Implemented ETL process to streamline the import of data from various sources into Big Query warehouse.
  • Build Data Pipelines in airflow in GCP for ETL related jobs using different airflow operators.
  • Optimized data pipelines to reduce costs by 30% while ensuring data integrity and accuracy.
  • Write and maintain Spark applications using Scala language, develop efficient and scalable code for data processing, analysis using Spark API’s.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Created jobs specially for performance tuning and optimization on multiple hive tables.
  • Experience in optimizing the performance of spark jobs and hive SQL queries.
  • Developed spark jobs to load and process data from multiple data formats like excel, xml, fixed width csv, parquet, json, xpt etc.,
  • Developed Spark applications using Scala to Migrate Data between multiple sources like from Traditional Oracle Data Warehouse to Hive, between Denodo and Hive, through Webservice calls between Trifacta and Hive.
  • Developed Scala Scripts, UDFs using both Data frames/SQL and RDD/Datasets using spark 2.0 for Data aggregation, queries and migration.
  • Implemented partitioning and Bucketing on the Respective Hive tables based on the requirements.
  • Automated Spark jobs using Airflow, Oozie and corn tab expressions in Linux for workflow management.
  • Configure and deploy Google Cloud clusters, selecting appropriate instance types and cluster sizes based on the workload requirements. Optimize Cloud clusters settings for performance and cost, considering factors like auto-scaling and instance groups.
  • Ingest data from various sources into Google Storage Bucket and design data storage strategies for efficient and scalable data processing. Implement processes for moving data between Google Storage Bucket and DataProc, ensuring optimal performance.
  • Integrate Spark applications, Google Cloud clusters, and Big Query with other GCP services, such as Cloud SQL for ETL processes.
  • Developed and maintained ETL processes using Apache Spark to integrate data from SFTP and Oracle into Snowflake.
  • Developed automated workflows and scheduling mechanisms for Spark jobs interacting with Snowflake.
  • Implemented data validation and quality checks within Spark jobs to ensure the integrity of data being processed and loaded into Snowflake.
  • Deployed the build code in Test/UAT environment through the CI/CD pipelines built using Git.
  • Actively involved in code review and bug fixing to improve the performance.
  • Working Proactively, independently and along with global teams to address project requirements and articulate issues/challenges with enough lead time to address project delivery risk.

Big Data Analyst

IMS Health
05.2016 - 08.2019
  • For creating a common learner data model, we have used Spark to perform necessary transformations and actions on the fly, which gets the data from Kafka in near real time and persists into Cassandra.
  • Developed Spark scripts to reduce the costs of organization by 30% and migrating the legacy systems from Cassandra, Oracle to build Data Lake in Hadoop.
  • Loading the data from the different Data sources like (Teradata, Cassandra and Oracle) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Design , develop and optimize Apache Spark applications using Spark SQL, DataFrames and Rdd’s to process and analyze large -scale datasets.
  • Implement ETL processes to transform and load data into Cassandra from Spark.
  • Optimize Spark jobs to interact with Cassandra, leveraging the Spark Cassandra connector or appropriate libraries.
  • Optimize Spark queries and operations to efficiently interact with Cassandra’ tables. Leverage Cassandra’s query and indexing mechanisms for improved performance.
  • Developed Scala Scripts, UDFFs using both Data frames/SQL and RDD/MapReduce using spark 2.0 for Data aggregation, queries and writing data back into OLTP system through SQOOP.
  • Experienced in performance tuning of Spark RDD and do in memory data computations to generate the output response.
  • Optimizing the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’S.
  • Implemented Elastic Search and Log Stash stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of spark using Scala.
  • Worked on migrating Map Reduce programs into Spark Transformations using spark and Scala.
  • Integrated existing code logic in HiveQL to SparkSQL applications for data transformation and aggregation and write it to hive table.
  • Implemented dimensional Data Modeling to deliver Multi-Dimensional STAR, Snowflake schemas by normalizing the dimension tables as appropriate in Data Lake.
  • Extensively worked on Impala for querying hive tables for low latency and given to end users.
  • Developed oozie workflows and sub workflows to orchestrate the Sqoop scripts, hive queries, Spark scripts to automate the ETL process.
  • Created Multiple Interactive Dashboards and views using Tableau Desktop 10 by enabling Tableau to connect to Hive and Cassandra as data sources.

Java Developer

ServerIT Solutions
05.2013 - 06.2014
  • Involved in the complete SDLC of the project from requirement analysis to testing.
  • Developed the modules based on the Spring MVC framework.
  • Used Spring MVC Restful Technology to support JSON and XML using JAX-RS for developing web services.
  • Developed and configured persistence layer using Hibernate Framework.
  • Developed the GUI using JavaScript, JQuery, HTML and CSS for interactive cross browser functionality and complex User Interface.
  • Developed and consumed web services using RESTful. Used Web Services to validate user details in real time scenario.
  • Coding of Business logic is done using Servlets, Session Beans and deployed them using Tomcat Web Server.
  • Used MVC Struts framework for the application Design.
  • Created SQL queries, PL/SQL standard procedures and functions for the back-end.
  • Involved in writing the stored procedures for database cross validations.
  • Identified the Test cases by working with the testing team.
  • Wrote test cases in JUnit for unit testing of classes.
  • Involved in preparing Code Review, Deployment and Documentation.

Education

Master of Science - Computer Science

North Dakota State University
Fargo, ND
05.2017

Bachelor of Technology - Electronics And Communication Engineering

Jawaharlal Nehru Technological University
Kakinada
05.2013

Skills

  • Cloud Technologies - Google Cloud Platform(GCP): BIG Query, Cloud DataProc, Google Cloud Storage, Composer, Cloud Functions
  • Big Data Components - Hadoop, MapReduce, YARN, Hive, Sqoop, Oozie, Kafka, Impala, Hue, HBase and Spark (Core, Spark SQL and Streaming)
  • Databases - Oracle, Teradata, Oracle and My SQL
  • Programming Languages - Scala,Python, Core Java, SQL and Shell Scripting
  • File Formats - Parquet, Avro, ORC, JSON, XML, different delimiter files and fixed length data files
  • Data Integration- Denodo
  • Data Visualization - Tableau and Trifacta
  • Orchestration - Airflow, Oozie and Crontab
  • Tools - Maven, Jenkins, IntelliJ, JIRA, Git hub and Bit Bucket

Certification

  • Certified Developer on Apache Spark, O’Reilly Media Inc.
  • Certified Tableau Desktop 10 Qualified Associate, Tableau Software,2017.

Timeline

Senior Data Engineer

IQVIA
09.2019 - Current

Big Data Analyst

IMS Health
05.2016 - 08.2019

Java Developer

ServerIT Solutions
05.2013 - 06.2014

Master of Science - Computer Science

North Dakota State University

Bachelor of Technology - Electronics And Communication Engineering

Jawaharlal Nehru Technological University
Aditya Nuthalapati