Summary

Overview

Work History

Education

Skills

Skills To Prioritize

Cloud Technologies

Databases

Data Engineering

Timeline

Nithin Kumar

United States

Summary

Adept Sr. Cloud Data Engineer with a proven track record at multiple clients, specializing in real-time analytics and ETL development in both on-premises environments like Microsoft and Oracle to cloud solutions like AZURE, AWS and Google Cloud (GCP). Around 9 years of IT experience in Data Engineering, System Analysis, Design, Development, Implementation and testing and support of Databases, Data Warehouse Applications, Data Visualization, technologies using Azure Synapse Analytics, Data Factory, Data Lake, Blob, Postgres, AWS tools, Google Cloud, SQL DW, SQL, Power BI, SSRS, SSAS, SSIS. Expert in data migration and quality assurance, I led a 20% reduction in processing times through performance tuning. Designing and implementing cloud data architectures, ETL pipelines, and data warehouses to support business-critical applications. Experience Integrating Google Cloud resources, including BigQuery, Cloud Pub/Sub, Cloud Storage, and Cloud Functions, to build end-to-end data pipelines that automated data flow and improved data accessibility. Participation of Integrations between Google Cloud databases (BigQuery, Cloud SQL) and external services, optimizing data synchronization and minimizing downtime. Applied best practices in schema design for BigQuery (e.g., denormalization, data flattening) to reduce query complexity and improve performance. Automated the optimization of long-running and resource-heavy queries using BigQuery’s query execution and performance monitoring tools, enabling better resource management and faster results. Excellent in High Level Design of ETL Packages and SSIS Packages for integrating data using OLE DB connection from heterogeneous sources like (Excel, CSV and flat files, MS Access, SAS, Oracle, DB2, CRM) by using multiple transformations provided by SSIS such as Data Conversion, Conditional Split, Bulk Insert, Derived Column, Merge, Merge Join and Union all. Implemented Copy activity, Dataflow, Trigger in Azure Data Factory Pipeline Activities for On - cloud ETL processing. Strong understanding of database structures, theories, principles, and practices with a knack for writing complex SQL queries. Demonstrated success in working collaboratively with cross-functional teams to deliver optimal solutions that align with business objectives. Noteworthy contributions include streamlining data processes and improving system performance efficiency.

Overview

years of professional experience

Work History

Sr. Data Engineer

CME (Chicago Mercantile Exchange)

Chicago, United States

03.2024 - Current

Developed and optimized complex SQL queries for analyzing large datasets within Google BigQuery, resulting in significant performance improvements and faster insights
Automated data extraction, transformation, and loading (ETL) processes using BigQuery to enable seamless data warehousing solutions
Managed BigQuery datasets and tables, implementing partitioning and clustering strategies for better performance and cost optimization
Developed complex data transformation jobs using Python and Spark for real-time and batch processing
Designed and implemented real-time data streaming pipelines using Google Pub/Sub for ingesting and processing large volumes of data in a scalable manner
Configured message topics, subscriptions, and push/pull mechanisms to enable decoupled communication between services, improving system reliability
Leveraged Google Pub/Sub to integrate event-driven architectures and enable microservices communication in cloud-based applications
Implemented data transformation workflows using Google Dataform, enabling efficient data pipeline management and version control within cloud environments
Integrated Google Dataform with BigQuery for seamless management and automation of SQL-based data transformations
Collaborated with data engineering teams to create modular, maintainable, and scalable data models using Dataform’s SQL-based approach
Designed and built data ingestion pipelines to load structured and unstructured data into Google BigQuery, ensuring real-time data availability for analytics
Integrated diverse data sources (APIs, Cloud Storage, Pub/Sub) with BigQuery, optimizing the extraction and loading process for large-scale datasets
Architected cloud-based data storage solutions using Google Cloud Storage, facilitating secure and scalable storage for both structured and unstructured data
Managed access control policies and implemented data lifecycle management strategies within Cloud Storage to optimize storage costs and data retention
Leveraged Cloud Storage to support data lakes and backups, ensuring high availability and redundancy for critical datasets
Written manual SQL procedures for ETL and data cleansing tasks in ETL load to ODS
Orchestrated and automated end-to-end data workflows using Cloud Composer, enabling seamless scheduling and monitoring of complex data pipelines
Integrated various Google Cloud services (BigQuery, Cloud Storage, Pub/Sub) into Cloud Composer to ensure efficient execution of multi-step data processing tasks
Streamlined operational processes by automating recurring tasks such as data loading, transformation, and reporting using Cloud Composer
Used Spark-SQL to Load Parquet data and created Datasets defined by Case classes and managed Structured data using Spark SQL and finally stored into Hive tables for downstream consumption
Managed version control for data engineering projects using GitHub, enabling collaboration, code reviews, and efficient team workflows
Implemented CI/CD pipelines for automated testing and deployment of data pipeline code, improving overall development efficiency and code quality
Documented and maintained repository guidelines, ensuring code consistency and best practices across development teams
Developed Data Pipelines using Apache Kafka to stream and process real time data
Managed version control and deployment of Python-based ETL scripts in Git and Jenkins
Delivered technical documentation for ETL processes and cloud-based architecture
Actively participated in sprint planning and agile ceremonies to ensure timely delivery of data engineering tasks
Coordinated the migration of Oracle databases to GCP CloudSQL for PostgreSQL
Performed data copy from on-premises Oracle databases to GCP CloudSQL
Configured CDC for primary schemas to ensure data consistency and real-time updates
Conducted extensive query performance tuning to optimize database performance post-migration and collaborated with development teams to troubleshoot and resolve performance-related issues
Refactored and rewrote SQL code to enhance performance and maintainability in PostgreSQL
Utilized GCP tools (BigQuery) and services for efficient database management and monitoring
Implemented security measures to protect sensitive data during migration
Provided training and support to junior developers on best practices in database development and migration
Analyzed and optimized hardware configurations to improve database performance
Environment: Google PubSub, ETL, SQL Developer, GCP, BigQuery, PostgreSQL 11.5, 10.12, 9.5, Spark, MySQL, SQLLoader, Data Migration Services (DMS), Oracle SQL Developer, Ora2PG, DataStage, Confluence, Hive, Unix, Linux, Windows, Google Cloud, Microsoft Visual Studio, CI/CD MS Excel, Python3.8, Shell Scripting, Terraform, GitHub, DB2, SparkSQL, SparkDF/DS

Sr. Data Engineer

UPMC (University of Pittsburgh Medical Centre)

Pittsburgh, United States

03.2022 - 02.2024

Company Overview: [Remote]
Gathering, analyzing, and documenting the business requirements and business rules by directly working with Business Analysts
Involved in the entire SDLC (Software Development Life Cycle) process that includes implementation, testing, deployment, documentation, training, and maintenance
Generated on-demand and scheduled reports for business analysis or management decision using SQL Server Reporting Services
Designed and developed AWS-based data pipeline solutions using AWS Lambda and Glue to process large volumes of healthcare data
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena
Worked with Healthcare Data/Claims 835 and 837 formats for analytical purposes, X12 Enterprise Data Integration (EDI), PHI
Worked with X12 EDI standards for healthcare data, HEDIS, and HIPAA
Developed complex ETL Packages using SQL Server 2019 Integration Services to load data from various sources like SQL Server/DB2 to Staging Database and then to Data Warehouse
Built the entire infrastructure that is required for optimal ETL from a variety of data sources using AWS, mainly with Pyspark
Used Spark-SQL to process the data and to run on Spark engine
Used Spark for improving performance and optimization on existing algorithms using Spark-SQL
Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations using AWS Athena
Create external tables with partitions using AWS Athena and Redshift
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD
Develop framework for converting existing PowerCenter mappings and to PySpark (Python and Spark) Jobs
Create Pyspark frame to bring data from DB2 to Amazon S3
Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements
Created and enforced policies to achieve HIPAA compliance
Data mapping, migration, and conversion to Data Warehouse Platform
Defined and deployed monitoring, metrics, and logging systems on AWS
Logical and Physical data modeling using Erwin for data warehouse database in STAR SCHEMA
Developed end to end ETL pipeline using Spark-SQL, Python and Salesforce on Spark engine
Developed Spark jobs, clean data from various feeds to make it suitable for ingestion and analysis
Imported data from various sources into Spark RDD for analysis
Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
Created Python and UNIX shell scripts while interacting with different AWS services
Developed Complex mappings to feed Data warehouse & Data marts by extensively using Mapplets, and Transformations like Lookup, Filter, Router, Expression, Aggregator, Joiner, Stored Procedure and Update Strategy
Implemented Event Handlers, Package Configurations, Logging, System and User-defined Variables, Check Points and Expressions for SSIS/ ETL Packages
Created AWS S3 buckets, performed folder management in each bucket, managed cloud trail logs and objects within each bucket
Completed POC on usability of AWS using EC2 and S3 storage and lambda functions
Develop ETL processes for the data warehouse to clean and standardize data and metadata for loading into the Datawarehouse
Create external tables with partitions using Hive, AWS Athena, and Redshift
Optimized SQL queries using indexes and execution plans for maximum efficiency and performance
Automated mappings to run using UNIX shell scripts, which included Pre- and Post-session jobs and extracted data from Transaction System into Staging Area
Wrote scripts and indexing strategy for migration to AWS Redshift from SQL Server and MySQL databases
Modify SSIS ETL packages for changes business requirements to load data from various sources to SQL server tables in Data Warehouse
Performed Developer Testing, Functional testing, Unit testing and created Test Plans and Test Cases
[Remote]
Environment: ETL, MS SQL Server 2019/2017/2016/2012/2008R2, MySQL, SSIS, Data Warehouse, SQL Server 2019/2017/2016/2014, Agile, PySpark, AWS Glue, AWS Batch AWS S3, Athena, Redshift, Python3.8, Shell Scripting, PySpark, Spark3.0, SparkSQL,SparkDF/DS

SSIS/SQL/Azure sql Data INTEGRATION ENGINEER

Gallagher Bassett

Rolling Meadows, United States

04.2018 - 03.2022

Gathering, analyzing, and documenting the business requirements and business rules by directly working with end users/clients
Created and maintained documents like Documentation Roadmap, DATA Models, ETL Execution Plan, Configuration Management Procedures, and Project Plan
Designed and developed various SSIS packages (ETL) to extract and transform data and involved in scheduling and deploying SSIS Packages
Extended SSIS Capabilities by Creating Custom SSIS Components for SharePoint
Implemented various tasks and transformations for data cleansing and performance tuning of SSIS packages
Extracted and Loaded SharePoint Lists and Surveys Data into SQL Server Database with SSIS
Created complex Stored Procedures, Triggers, Functions (UDF), Indexes, Tables, Views and other T-SQL codes and SQL joins for SSIS packages
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB)
Extend SSIS capabilities by creating custom SSIS components for SharePoint
Configure SSIS packages with XML configuration file, environment variable, parent package variable and SQL Server table
Worked with SSIS Data flow task, Transformations, For-loop containers and Fuzzy Lookups, Data conversion and configured data in slowly changing dimensions
Create complex Stored Procedures, Triggers, Functions (UDF), Indexes, Tables, Views and other T-SQL code and SQL joins for SSIS packages and SSRS reports
Perform SSIS Development and support, developed ETL solutions for integrating data from multiple sources like Flat Files (delimited, fixed width), Excel, SQL Server, Raw File, and DB2 into the central OLTP database
Worked on creating dependencies of activities in Azure Data factory
Propagation of XML data through a policy engine to the database Master and Data Lineage database
Creating Stored Procedure and Scheduled them in Azure Environment
Monitoring Produced and Consumed Data Sets of ADF
Creating Data Factories in Azure Data factory
Build complex ETL jobs data transform data visually with data flows or by using compute services Azure Databricks, and Azure SQL Database
Creating Multiple Data Sets in Azure worked in migration of SQL Server 2008/2012/2014 to SQL Server 2017
Moving CSV files from Azure blob to Azure SQL Server
Analyzed existing databases, tables, and other objects to prepare to migrate to Azure Synapse
Worked on on-prem Datawarehouse migration to Azure Synapse using polybase and ADF
Performed SSIS Development and support, developed ETL solutions for integrating data from multiple sources like Flat Files (delimited, fixed width), Excel, SQL Server, Raw File, and DB2 into the central OLTP database
Migrating on-prem ETLs from MS SQL server to Azure Cloud using Azure Data Factory and Databricks
Involved in configuring Azure platform for data pipelines, ADF, Azure Blob Storage and Data Lakes and building workflows to automate data flow using ADF
Developed metadata management program with business and technical definitions, data lineage, life cycle, identify authoritative source
Analyzing the Data from different sources using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Synapse, Azure Data Lake Analytics, HDInsight, Hive, Sqoop
Experienced in forecasting the data space requirements using the features like Capacity planning, mapping source to target data using data lineage feature and calculating the impact analysis
Creating Azure Data factories for loading the data to Azure SQL database from Cosmos platform
Developed, documented Data Modeling standards to establish in the organization
Created and managed backend process to meet diverse business needs
Performed data loading, data import/export, data transmission
Involved in data modeling Star Schemas according to the Business requirements using Erwin 6.3.8
Worked extensively on system analysis, design, development, testing and implementation of projects
Assist with load testing activities by setting up database counters to monitor backend performance
Pulled data into MS SQL Server Database from different DBMS databases like Oracle and Teradata with ETL packages
Created and scheduled Various SQL Jobs, using SQL Server Agent to perform various administrative tasks
Environment: ETL, MS SQL Server 2017/2016/2012/2008, PDW, MS SQL Server Master Data Services, SQL Server Integration Services (SSIS), MS Visio, Azure Data Factory, Azure Databricks, MS Visual Studio/.NET 2008, TFS, VSTS, Azure Devops , Sharepoint

SQL Database Developer

Techno Vision Solutions

Farmington Hills, United States

01.2017 - 04.2018

Created Complex ETL Packages using SSIS for extracting, cleaning, transforming, and loading data from staging tables to partitioned tables with incremental load
Developed, deployed, and monitored SSIS Packages
Data Automation Using SSIS to automate data processes for input of information
Developed SSIS packages to extract, transform and load data from Oracle and SQL Server databases into Data Warehouse
Deployed SSIS packages into various Environments (Dev, UAT and Prod) using Deployment Utility
Implemented data validation by using T-SQL queries to confirm that SSRS reports data results that are being returned are correct
Design Access databases for reporting and claims data analysis
Create tables, table updates and inserts to manipulate the data
Created packages using various control flow tasks such as Data Flow Task, Execute SQL Task, Execute Package Task, and File System Task
Designed and implemented multiple dashboards for internal metrics using Azure Synapse - PowerPivot & Power Query tools
Involved in creation of Data Warehouse database Physical Model, Logical Model using Erwin data modeling tool
Designed the Data warehouse and done the mappings from Source to the Target tables
Created Indexes for faster data retrieval from the database
Developed Spark applications in Databricks using PySpark and Spark SQL to perform transformations and aggregations on source data before loading it into Azure Synapse Analytics for reporting
Writing Stored Procedures and Functions for better performance and flexibility
Created SQL Server jobs and scheduled them to load data periodically using SQL server Agent
Strong skills in transformation data from DB2 to SQL Server
Developed Aggregations, partitions, and calculated members for cube as per business requirements
Defined appropriate measure groups and KPIs and deployed cubes
Created Parameterized, Crosstab, Drill Down and Summary reports by Using SSRS and created report snapshots to improve the performance of SSRS
Environment: ETL, MS SQL Server 2012/2008, MS SQL Server Master Data Services 2012, SQL Server Integration Services (SSIS), SSRS, MS Visual Studio/.NET 2008, BI Extractor, Microsoft Excel, Add-In Excel with Master data

Education

Master of Science - Computer Science

Silicon Valley University

San Jose, CA

01-2017

Skills

Real-time analytics
Scripting languages
Data migration
ETL development
SQL expertise
Data quality assurance
Performance tuning
Data governance
NoSQL databases
BigQuery
Google Cloud (GCP)
Azure Data Factory
Python & Spark
SQL & NoSQL
Data Migration & ETL
SQL Server 2019/2017/2016/14/12/08 R2
Azure SQL Data Warehouse
MySQL
Oracle 12c/11g
PostgreSQL 115
1012
95
MS Access 2000/80
Teradata 140
DB2
SQL Server Management Studio
SSDT 2016/14/12
BIDS 2008R2/08
SQL Server Integration Services (SSIS 2019/2017/2016/2014/2008 R2)
SQL Server Reporting Services (SSRS 2016/2 R2)
Power BI

Azure Data Warehouse
Azure SQL
Azure Data Lake
Azure Databricks
AWS (Lambda, Glue, EMR, Redshift, S3)
Google Cloud (GCP) SQL
T-SQL
PL-SQL
PySpark
Spark with SQL
PL/pgSQL
Linux
Bash
Shell Scripting
Python
Scala
ETL
Data Modeling
Data Warehousing
Real-time Data Processing
Data Governance
Data warehousing
Metadata management
Big data processing
Data security
Data pipeline control
Data integration
Data pipeline design
Data modeling
Spark framework

Skills To Prioritize

BigQuery
Google Cloud (GCP)
Azure Data Factory
Python & Spark
SQL & NoSQL
Data Migration & ETL

Cloud Technologies

Azure Data Warehouse
Azure SQL
Azure Data Lake
Azure Data Factory
Azure Databricks
AWS (Lambda, Glue, EMR, Redshift, S3)
Google Cloud (GCP)

Databases

SQL Server 2019/2017/2016/14/12/08 R2
Azure SQL Data Warehouse
MySQL
Oracle 12c/11g
PostgreSQL 11.5, 10.12, 9.5
MS Access 2000/8.0
Teradata 14.0
DB2
SQL Server Management Studio
SSDT 2016/14/12
BIDS 2008R2/08
SQL Server Integration Services (SSIS 2019/2017/2016/2014/2008 R2)
SQL Server Reporting Services (SSRS 2016/2 R2)
Power BI

Data Engineering

ETL
Data Modeling
Data Warehousing
Real-time Data Processing
Data migration
Data Governance
Real-time analytics
Data quality assurance
Performance tuning

Timeline

Sr. Data Engineer

CME (Chicago Mercantile Exchange)

03.2024 - Current

Sr. Data Engineer

UPMC (University of Pittsburgh Medical Centre)

03.2022 - 02.2024

SSIS/SQL/Azure sql Data INTEGRATION ENGINEER

Gallagher Bassett

04.2018 - 03.2022

SQL Database Developer

Techno Vision Solutions

01.2017 - 04.2018

Master of Science - Computer Science

Silicon Valley University

Nithin Kumar

Summary

Overview

Work History

Sr. Data Engineer

Sr. Data Engineer

SSIS/SQL/Azure sql Data INTEGRATION ENGINEER

SQL Database Developer

Education

Master of Science - Computer Science

Skills

Skills To Prioritize

Cloud Technologies

Databases

Data Engineering

Timeline

Sr. Data Engineer

Sr. Data Engineer

SSIS/SQL/Azure sql Data INTEGRATION ENGINEER

SQL Database Developer

Master of Science - Computer Science

Similar Profiles

Lori Bilske, MPA, MBALori Bilske, MPA, MBA

Walter E. FlesWalter E. Fles

RAJASEKHARA REDDY DUGGIMPUDIRAJASEKHARA REDDY DUGGIMPUDI

LILIAN SANGLILIAN SANG

Rashmi BairoliyaRashmi Bairoliya