Summary
Overview
Work History
Education
Skills
Skills To Prioritize
Cloud Technologies
Databases
Data Engineering
Timeline
Generic

Nithin Kumar

United States

Summary

Adept Sr. Cloud Data Engineer with a proven track record at multiple clients, specializing in real-time analytics and ETL development in both on-premises environments like Microsoft and Oracle to cloud solutions like AZURE, AWS and Google Cloud (GCP). Around 9 years of IT experience in Data Engineering, System Analysis, Design, Development, Implementation and testing and support of Databases, Data Warehouse Applications, Data Visualization, technologies using Azure Synapse Analytics, Data Factory, Data Lake, Blob, Postgres, AWS tools, Google Cloud, SQL DW, SQL, Power BI, SSRS, SSAS, SSIS. Expert in data migration and quality assurance, I led a 20% reduction in processing times through performance tuning. Designing and implementing cloud data architectures, ETL pipelines, and data warehouses to support business-critical applications. Experience Integrating Google Cloud resources, including BigQuery, Cloud Pub/Sub, Cloud Storage, and Cloud Functions, to build end-to-end data pipelines that automated data flow and improved data accessibility. Participation of Integrations between Google Cloud databases (BigQuery, Cloud SQL) and external services, optimizing data synchronization and minimizing downtime. Applied best practices in schema design for BigQuery (e.g., denormalization, data flattening) to reduce query complexity and improve performance. Automated the optimization of long-running and resource-heavy queries using BigQuery’s query execution and performance monitoring tools, enabling better resource management and faster results. Excellent in High Level Design of ETL Packages and SSIS Packages for integrating data using OLE DB connection from heterogeneous sources like (Excel, CSV and flat files, MS Access, SAS, Oracle, DB2, CRM) by using multiple transformations provided by SSIS such as Data Conversion, Conditional Split, Bulk Insert, Derived Column, Merge, Merge Join and Union all. Implemented Copy activity, Dataflow, Trigger in Azure Data Factory Pipeline Activities for On - cloud ETL processing. Strong understanding of database structures, theories, principles, and practices with a knack for writing complex SQL queries. Demonstrated success in working collaboratively with cross-functional teams to deliver optimal solutions that align with business objectives. Noteworthy contributions include streamlining data processes and improving system performance efficiency.

Overview

8
8
years of professional experience

Work History

Sr. Data Engineer

CME (Chicago Mercantile Exchange)
Chicago, United States
03.2024 - Current
  • Developed and optimized complex SQL queries for analyzing large datasets within Google BigQuery, resulting in significant performance improvements and faster insights
  • Automated data extraction, transformation, and loading (ETL) processes using BigQuery to enable seamless data warehousing solutions
  • Managed BigQuery datasets and tables, implementing partitioning and clustering strategies for better performance and cost optimization
  • Developed complex data transformation jobs using Python and Spark for real-time and batch processing
  • Designed and implemented real-time data streaming pipelines using Google Pub/Sub for ingesting and processing large volumes of data in a scalable manner
  • Configured message topics, subscriptions, and push/pull mechanisms to enable decoupled communication between services, improving system reliability
  • Leveraged Google Pub/Sub to integrate event-driven architectures and enable microservices communication in cloud-based applications
  • Implemented data transformation workflows using Google Dataform, enabling efficient data pipeline management and version control within cloud environments
  • Integrated Google Dataform with BigQuery for seamless management and automation of SQL-based data transformations
  • Collaborated with data engineering teams to create modular, maintainable, and scalable data models using Dataform’s SQL-based approach
  • Designed and built data ingestion pipelines to load structured and unstructured data into Google BigQuery, ensuring real-time data availability for analytics
  • Integrated diverse data sources (APIs, Cloud Storage, Pub/Sub) with BigQuery, optimizing the extraction and loading process for large-scale datasets
  • Architected cloud-based data storage solutions using Google Cloud Storage, facilitating secure and scalable storage for both structured and unstructured data
  • Managed access control policies and implemented data lifecycle management strategies within Cloud Storage to optimize storage costs and data retention
  • Leveraged Cloud Storage to support data lakes and backups, ensuring high availability and redundancy for critical datasets
  • Written manual SQL procedures for ETL and data cleansing tasks in ETL load to ODS
  • Orchestrated and automated end-to-end data workflows using Cloud Composer, enabling seamless scheduling and monitoring of complex data pipelines
  • Integrated various Google Cloud services (BigQuery, Cloud Storage, Pub/Sub) into Cloud Composer to ensure efficient execution of multi-step data processing tasks
  • Streamlined operational processes by automating recurring tasks such as data loading, transformation, and reporting using Cloud Composer
  • Used Spark-SQL to Load Parquet data and created Datasets defined by Case classes and managed Structured data using Spark SQL and finally stored into Hive tables for downstream consumption
  • Managed version control for data engineering projects using GitHub, enabling collaboration, code reviews, and efficient team workflows
  • Implemented CI/CD pipelines for automated testing and deployment of data pipeline code, improving overall development efficiency and code quality
  • Documented and maintained repository guidelines, ensuring code consistency and best practices across development teams
  • Developed Data Pipelines using Apache Kafka to stream and process real time data
  • Managed version control and deployment of Python-based ETL scripts in Git and Jenkins
  • Delivered technical documentation for ETL processes and cloud-based architecture
  • Actively participated in sprint planning and agile ceremonies to ensure timely delivery of data engineering tasks
  • Coordinated the migration of Oracle databases to GCP CloudSQL for PostgreSQL
  • Performed data copy from on-premises Oracle databases to GCP CloudSQL
  • Configured CDC for primary schemas to ensure data consistency and real-time updates
  • Conducted extensive query performance tuning to optimize database performance post-migration and collaborated with development teams to troubleshoot and resolve performance-related issues
  • Refactored and rewrote SQL code to enhance performance and maintainability in PostgreSQL
  • Utilized GCP tools (BigQuery) and services for efficient database management and monitoring
  • Implemented security measures to protect sensitive data during migration
  • Provided training and support to junior developers on best practices in database development and migration
  • Analyzed and optimized hardware configurations to improve database performance
  • Environment: Google PubSub, ETL, SQL Developer, GCP, BigQuery, PostgreSQL 11.5, 10.12, 9.5, Spark, MySQL, SQLLoader, Data Migration Services (DMS), Oracle SQL Developer, Ora2PG, DataStage, Confluence, Hive, Unix, Linux, Windows, Google Cloud, Microsoft Visual Studio, CI/CD MS Excel, Python3.8, Shell Scripting, Terraform, GitHub, DB2, SparkSQL, SparkDF/DS

Sr. Data Engineer

UPMC (University of Pittsburgh Medical Centre)
Pittsburgh, United States
03.2022 - 02.2024
  • Company Overview: [Remote]
  • Gathering, analyzing, and documenting the business requirements and business rules by directly working with Business Analysts
  • Involved in the entire SDLC (Software Development Life Cycle) process that includes implementation, testing, deployment, documentation, training, and maintenance
  • Generated on-demand and scheduled reports for business analysis or management decision using SQL Server Reporting Services
  • Designed and developed AWS-based data pipeline solutions using AWS Lambda and Glue to process large volumes of healthcare data
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena
  • Worked with Healthcare Data/Claims 835 and 837 formats for analytical purposes, X12 Enterprise Data Integration (EDI), PHI
  • Worked with X12 EDI standards for healthcare data, HEDIS, and HIPAA
  • Developed complex ETL Packages using SQL Server 2019 Integration Services to load data from various sources like SQL Server/DB2 to Staging Database and then to Data Warehouse
  • Built the entire infrastructure that is required for optimal ETL from a variety of data sources using AWS, mainly with Pyspark
  • Used Spark-SQL to process the data and to run on Spark engine
  • Used Spark for improving performance and optimization on existing algorithms using Spark-SQL
  • Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations using AWS Athena
  • Create external tables with partitions using AWS Athena and Redshift
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD
  • Develop framework for converting existing PowerCenter mappings and to PySpark (Python and Spark) Jobs
  • Create Pyspark frame to bring data from DB2 to Amazon S3
  • Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements
  • Created and enforced policies to achieve HIPAA compliance
  • Data mapping, migration, and conversion to Data Warehouse Platform
  • Defined and deployed monitoring, metrics, and logging systems on AWS
  • Logical and Physical data modeling using Erwin for data warehouse database in STAR SCHEMA
  • Developed end to end ETL pipeline using Spark-SQL, Python and Salesforce on Spark engine
  • Developed Spark jobs, clean data from various feeds to make it suitable for ingestion and analysis
  • Imported data from various sources into Spark RDD for analysis
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
  • Created Python and UNIX shell scripts while interacting with different AWS services
  • Developed Complex mappings to feed Data warehouse & Data marts by extensively using Mapplets, and Transformations like Lookup, Filter, Router, Expression, Aggregator, Joiner, Stored Procedure and Update Strategy
  • Implemented Event Handlers, Package Configurations, Logging, System and User-defined Variables, Check Points and Expressions for SSIS/ ETL Packages
  • Created AWS S3 buckets, performed folder management in each bucket, managed cloud trail logs and objects within each bucket
  • Completed POC on usability of AWS using EC2 and S3 storage and lambda functions
  • Develop ETL processes for the data warehouse to clean and standardize data and metadata for loading into the Datawarehouse
  • Create external tables with partitions using Hive, AWS Athena, and Redshift
  • Optimized SQL queries using indexes and execution plans for maximum efficiency and performance
  • Automated mappings to run using UNIX shell scripts, which included Pre- and Post-session jobs and extracted data from Transaction System into Staging Area
  • Wrote scripts and indexing strategy for migration to AWS Redshift from SQL Server and MySQL databases
  • Modify SSIS ETL packages for changes business requirements to load data from various sources to SQL server tables in Data Warehouse
  • Performed Developer Testing, Functional testing, Unit testing and created Test Plans and Test Cases
  • [Remote]
  • Environment: ETL, MS SQL Server 2019/2017/2016/2012/2008R2, MySQL, SSIS, Data Warehouse, SQL Server 2019/2017/2016/2014, Agile, PySpark, AWS Glue, AWS Batch AWS S3, Athena, Redshift, Python3.8, Shell Scripting, PySpark, Spark3.0, SparkSQL,SparkDF/DS

SSIS/SQL/Azure sql Data INTEGRATION ENGINEER

Gallagher Bassett
Rolling Meadows, United States
04.2018 - 03.2022
  • Gathering, analyzing, and documenting the business requirements and business rules by directly working with end users/clients
  • Created and maintained documents like Documentation Roadmap, DATA Models, ETL Execution Plan, Configuration Management Procedures, and Project Plan
  • Designed and developed various SSIS packages (ETL) to extract and transform data and involved in scheduling and deploying SSIS Packages
  • Extended SSIS Capabilities by Creating Custom SSIS Components for SharePoint
  • Implemented various tasks and transformations for data cleansing and performance tuning of SSIS packages
  • Extracted and Loaded SharePoint Lists and Surveys Data into SQL Server Database with SSIS
  • Created complex Stored Procedures, Triggers, Functions (UDF), Indexes, Tables, Views and other T-SQL codes and SQL joins for SSIS packages
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB)
  • Extend SSIS capabilities by creating custom SSIS components for SharePoint
  • Configure SSIS packages with XML configuration file, environment variable, parent package variable and SQL Server table
  • Worked with SSIS Data flow task, Transformations, For-loop containers and Fuzzy Lookups, Data conversion and configured data in slowly changing dimensions
  • Create complex Stored Procedures, Triggers, Functions (UDF), Indexes, Tables, Views and other T-SQL code and SQL joins for SSIS packages and SSRS reports
  • Perform SSIS Development and support, developed ETL solutions for integrating data from multiple sources like Flat Files (delimited, fixed width), Excel, SQL Server, Raw File, and DB2 into the central OLTP database
  • Worked on creating dependencies of activities in Azure Data factory
  • Propagation of XML data through a policy engine to the database Master and Data Lineage database
  • Creating Stored Procedure and Scheduled them in Azure Environment
  • Monitoring Produced and Consumed Data Sets of ADF
  • Creating Data Factories in Azure Data factory
  • Build complex ETL jobs data transform data visually with data flows or by using compute services Azure Databricks, and Azure SQL Database
  • Creating Multiple Data Sets in Azure worked in migration of SQL Server 2008/2012/2014 to SQL Server 2017
  • Moving CSV files from Azure blob to Azure SQL Server
  • Analyzed existing databases, tables, and other objects to prepare to migrate to Azure Synapse
  • Worked on on-prem Datawarehouse migration to Azure Synapse using polybase and ADF
  • Performed SSIS Development and support, developed ETL solutions for integrating data from multiple sources like Flat Files (delimited, fixed width), Excel, SQL Server, Raw File, and DB2 into the central OLTP database
  • Migrating on-prem ETLs from MS SQL server to Azure Cloud using Azure Data Factory and Databricks
  • Involved in configuring Azure platform for data pipelines, ADF, Azure Blob Storage and Data Lakes and building workflows to automate data flow using ADF
  • Developed metadata management program with business and technical definitions, data lineage, life cycle, identify authoritative source
  • Analyzing the Data from different sources using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Synapse, Azure Data Lake Analytics, HDInsight, Hive, Sqoop
  • Experienced in forecasting the data space requirements using the features like Capacity planning, mapping source to target data using data lineage feature and calculating the impact analysis
  • Creating Azure Data factories for loading the data to Azure SQL database from Cosmos platform
  • Developed, documented Data Modeling standards to establish in the organization
  • Created and managed backend process to meet diverse business needs
  • Performed data loading, data import/export, data transmission
  • Involved in data modeling Star Schemas according to the Business requirements using Erwin 6.3.8
  • Worked extensively on system analysis, design, development, testing and implementation of projects
  • Assist with load testing activities by setting up database counters to monitor backend performance
  • Pulled data into MS SQL Server Database from different DBMS databases like Oracle and Teradata with ETL packages
  • Created and scheduled Various SQL Jobs, using SQL Server Agent to perform various administrative tasks
  • Environment: ETL, MS SQL Server 2017/2016/2012/2008, PDW, MS SQL Server Master Data Services, SQL Server Integration Services (SSIS), MS Visio, Azure Data Factory, Azure Databricks, MS Visual Studio/.NET 2008, TFS, VSTS, Azure Devops , Sharepoint

SQL Database Developer

Techno Vision Solutions
Farmington Hills, United States
01.2017 - 04.2018
  • Created Complex ETL Packages using SSIS for extracting, cleaning, transforming, and loading data from staging tables to partitioned tables with incremental load
  • Developed, deployed, and monitored SSIS Packages
  • Data Automation Using SSIS to automate data processes for input of information
  • Developed SSIS packages to extract, transform and load data from Oracle and SQL Server databases into Data Warehouse
  • Deployed SSIS packages into various Environments (Dev, UAT and Prod) using Deployment Utility
  • Implemented data validation by using T-SQL queries to confirm that SSRS reports data results that are being returned are correct
  • Design Access databases for reporting and claims data analysis
  • Create tables, table updates and inserts to manipulate the data
  • Created packages using various control flow tasks such as Data Flow Task, Execute SQL Task, Execute Package Task, and File System Task
  • Designed and implemented multiple dashboards for internal metrics using Azure Synapse - PowerPivot & Power Query tools
  • Involved in creation of Data Warehouse database Physical Model, Logical Model using Erwin data modeling tool
  • Designed the Data warehouse and done the mappings from Source to the Target tables
  • Created Indexes for faster data retrieval from the database
  • Developed Spark applications in Databricks using PySpark and Spark SQL to perform transformations and aggregations on source data before loading it into Azure Synapse Analytics for reporting
  • Writing Stored Procedures and Functions for better performance and flexibility
  • Created SQL Server jobs and scheduled them to load data periodically using SQL server Agent
  • Strong skills in transformation data from DB2 to SQL Server
  • Developed Aggregations, partitions, and calculated members for cube as per business requirements
  • Defined appropriate measure groups and KPIs and deployed cubes
  • Created Parameterized, Crosstab, Drill Down and Summary reports by Using SSRS and created report snapshots to improve the performance of SSRS
  • Environment: ETL, MS SQL Server 2012/2008, MS SQL Server Master Data Services 2012, SQL Server Integration Services (SSIS), SSRS, MS Visual Studio/.NET 2008, BI Extractor, Microsoft Excel, Add-In Excel with Master data

Education

Master of Science - Computer Science

Silicon Valley University
San Jose, CA
01-2017

Skills

  • Real-time analytics
  • Scripting languages
  • Data migration
  • ETL development
  • SQL expertise
  • Data quality assurance
  • Performance tuning
  • Data governance
  • NoSQL databases
  • BigQuery
  • Google Cloud (GCP)
  • Azure Data Factory
  • Python & Spark
  • SQL & NoSQL
  • Data Migration & ETL
  • SQL Server 2019/2017/2016/14/12/08 R2
  • Azure SQL Data Warehouse
  • MySQL
  • Oracle 12c/11g
  • PostgreSQL 115
  • 1012
  • 95
  • MS Access 2000/80
  • Teradata 140
  • DB2
  • SQL Server Management Studio
  • SSDT 2016/14/12
  • BIDS 2008R2/08
  • SQL Server Integration Services (SSIS 2019/2017/2016/2014/2008 R2)
  • SQL Server Reporting Services (SSRS 2016/2 R2)
  • Power BI
  • Azure Data Warehouse
  • Azure SQL
  • Azure Data Lake
  • Azure Databricks
  • AWS (Lambda, Glue, EMR, Redshift, S3)
  • Google Cloud (GCP) SQL
  • T-SQL
  • PL-SQL
  • PySpark
  • Spark with SQL
  • PL/pgSQL
  • Linux
  • Bash
  • Shell Scripting
  • Python
  • Scala
  • ETL
  • Data Modeling
  • Data Warehousing
  • Real-time Data Processing
  • Data Governance
  • Data warehousing
  • Metadata management
  • Big data processing
  • Data security
  • Data pipeline control
  • Data integration
  • Data pipeline design
  • Data modeling
  • Spark framework

Skills To Prioritize

  • BigQuery
  • Google Cloud (GCP)
  • Azure Data Factory
  • Python & Spark
  • SQL & NoSQL
  • Data Migration & ETL

Cloud Technologies

  • Azure Data Warehouse
  • Azure SQL
  • Azure Data Lake
  • Azure Data Factory
  • Azure Databricks
  • AWS (Lambda, Glue, EMR, Redshift, S3)
  • Google Cloud (GCP)

Databases

  • SQL Server 2019/2017/2016/14/12/08 R2
  • Azure SQL Data Warehouse
  • MySQL
  • Oracle 12c/11g
  • PostgreSQL 11.5, 10.12, 9.5
  • MS Access 2000/8.0
  • Teradata 14.0
  • DB2
  • SQL Server Management Studio
  • SSDT 2016/14/12
  • BIDS 2008R2/08
  • SQL Server Integration Services (SSIS 2019/2017/2016/2014/2008 R2)
  • SQL Server Reporting Services (SSRS 2016/2 R2)
  • Power BI

Data Engineering

  • ETL
  • Data Modeling
  • Data Warehousing
  • Real-time Data Processing
  • Data migration
  • Data Governance
  • Real-time analytics
  • Data quality assurance
  • Performance tuning

Timeline

Sr. Data Engineer

CME (Chicago Mercantile Exchange)
03.2024 - Current

Sr. Data Engineer

UPMC (University of Pittsburgh Medical Centre)
03.2022 - 02.2024

SSIS/SQL/Azure sql Data INTEGRATION ENGINEER

Gallagher Bassett
04.2018 - 03.2022

SQL Database Developer

Techno Vision Solutions
01.2017 - 04.2018

Master of Science - Computer Science

Silicon Valley University
Nithin Kumar