Summary
Overview
Work History
Education
Skills
Timeline
Generic

Nitish

TX

Summary

Senior engineering professional with deep expertise in data architecture, pipeline development, and big data technologies. Proven track record in optimizing data workflows, enhancing system efficiency, and driving business intelligence initiatives. Strong collaborator, adaptable to evolving project demands, with focus on delivering impactful results through teamwork and innovation. Skilled in SQL, Python, Spark, and cloud platforms, with strategic approach to data management and problem-solving. Experienced leader with strong background in guiding teams, managing complex projects, and achieving strategic objectives. Excels in developing efficient processes, ensuring high standards, and aligning efforts with organizational goals. Known for collaborative approach and commitment to excellence.

Overview

11
11
years of professional experience

Work History

Senior Data Engineer

Cigna
10.2022 - Current
  • Designed, implemented, and maintained real-time and batch data processing pipelines using Azure Databricks and Apache Spark
  • Collaborated with data scientists and analysts to understand data requirements and transformed them into scalable solutions
  • Led end-to-end data engineering initiatives using Azure Databricks and Azure Data Factory, delivering high-quality data solutions
  • Designed, developed, and maintained Azure Cosmos DB and Azure Data Explorer solutions for high-throughput data ingestion and real-time analytics
  • Architected data models and partitioning strategies in Azure Cosmos DB, resulting in enhanced performance and reduced costs
  • Collaborated with data scientists and analysts to develop optimal query patterns, ensuring efficient data retrieval for reporting and analysis
  • Implemented custom connectors and integrations to ingest data from various sources, enhancing the overall data ecosystem
  • Provided technical leadership and guidance to junior data engineers, fostering a culture of innovation and knowledge sharing
  • Designed and implemented complex ETL pipelines for batch and real-time processing, ensuring data accuracy and integrity
  • Collaborated closely with data scientists and analysts to understand data needs and translate them into technical requirements
  • Optimized data storage solutions with Azure Data Lake Storage and Azure SQL Data Warehouse for improved performance
  • Provided technical leadership, mentoring, and training to junior team members
  • Implemented data partitioning and optimization strategies, resulting in a 30% reduction in pipeline processing time
  • Developed custom UDFs (User-Defined Functions) to perform complex transformations and data enrichment tasks
  • Implemented data lineage tracking and documentation, ensuring data governance and compliance with industry standards
  • Led the adoption of Terraform for infrastructure provisioning, reducing deployment time by 50% and ensuring consistency across multiple environments
  • Designed and implemented complex Azure, or GCP cloud architectures using Terraform modules to support scalable and resilient applications
  • Collaborated with development and operations teams to establish best practices for Terraform code organization and versioning using Git
  • Automated the creation of CI/CD pipelines for Terraform code validation and deployment, enhancing the development workflow
  • Contributed to the development of a real-time fraud detection system utilizing Azure Databricks and Spark Streaming
  • Participated in the creation of data ingestion pipelines for high-velocity streaming data processing
  • Engaged in code reviews and collaborated with senior engineers to enhance code quality and performance
  • Designed and executed automated test suites to ensure the reliability and accuracy of data processing workflows
  • Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production
  • Queried both Managed and External tables created by Hive using Impala
  • Involved with extraction of large volumes of data and analysis of complex business logics; to derive business-oriented insights and recommending/proposing new solutions to the business in Excel Report
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
  • Encoded and decoded objects using PySpark to create and modify the data frames in Apache Spark
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS)
  • Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python
  • Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
  • Designed, developed, and managed Power BI, Tableau, QlikView, Qlik Sense Apps including Dashboard, Reports, Storytelling
  • Created a new Power BI reports dashboard with 13 pages according to the design spec in two weeks beating the tight timeline
  • Deployed an automation to production for update the company holiday schedule based on company's holiday policy which need to be updated yearly
  • Used Informatica Power Center for extraction, transformation, and loading (ETL) of data in the data warehouse
  • Loading data into Snowflake tables from internal stage using SnowSQL
  • Prepared data warehouse using Star/Snowflake schema concepts in Snowflake using SnowSQL
  • Prepared Tableau reports and dashboards with calculated fields, parameters, sets, groups or bins and publish on the server

Senior AWS Data Engineer

PNC Bank
02.2021 - 10.2022
  • Developed Hive ETL Logic for data cleansing and transformation of data coming through RBMS
  • Utilized Pyspark and Spark-SQL to develop Spark applications for data extraction, transformation, and aggregation, leading to insights into customer usage patterns
  • Analyze SQL scripts and design the solution to implement using Pyspark
  • Designed and implemented data warehousing solutions using Snowflake, leveraging its unique multi-cluster architecture for scalable and elastic data processing
  • Experienced in performance tuning of Spark applications to optimize batch interval time, parallelism, and memory usage for enhanced efficiency
  • AWS CI/CD Data pipeline and AWS Data Lake using EC2, AWS Glue, AWS Lambda
  • Implemented data quality checks and validation processes within Databricks to ensure the accuracy and reliability of data
  • Developed Scala and Pyspark User-Defined Functions (UDFs) to meet specific business requirements
  • Implemented kinesis data streams to read real time data and loaded into data s3 for downstream processing
  • Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake
  • Implemented data validation and testing frameworks within DBT to ensure high data quality and integrity
  • Creating AWS Lambda functions using python for deployment management in AWS and designed, investigated and implemented public facing websites on AWS and integrated it with other applications infrastructure
  • Implemented NiFi pipelines to export data from HDFS to cloud locations like AWS
  • Proficient in developing SQL scripts for automation purposes, contributing to improved workflow efficiency
  • Implemented metadata management and data cataloging solutions within Databricks to enhance data discoverability and lineage tracking
  • Worked with Visual Studio Team Services (VSTS) to create build and release processes for multiple projects in a production environment
  • Created ETL framework to hydrate the data lake using pyspark
  • Collaborated with the Big Data Architecture Team to lay the foundation for an Enterprise Analytics initiative in a Hadoop-based Data Lake
  • Developed and implemented Machine Learning algorithms for predictive modeling
  • Extensively utilized SSIS transformations for data integration and processing
  • Extracted data from various APIs, data cleansing and processing by using Java and Scala
  • Worked in different parts of data lake implementations and maintenance for ETL processing
  • Created and managed ETL processes using SnowSQL, Snowflake's SQL-based ETL tool, to extract, transform, and load data from various source systems into Snowflake data warehouses
  • Developed Spark applications in Java (Spark) on distributed environments, efficiently loading CSV files into Hive ORC tables
  • Developed Spark jobs using Scala and Python on Yarn for interactive and batch analysis
  • Developed data pipelines using Kafka, Spark, and Hive for data ingestion, transformation, and analysis
  • Collaborated with systems engineering teams to plan and deploy new Hadoop environments and expand existing Hadoop clusters

Azure Data Engineer

The Home Depot
12.2019 - 02.2021
  • Created Spark jobs by writing RDDs in Python and created data frames in Spark SQL to perform data analysis and stored in Azure Data Lake
  • Provided expert guidance to clients in architecting, developing, and deploying data solutions on Azure Databricks and HDInsight
  • Conducted performance tuning and optimization of existing Spark jobs, resulting in a 25% reduction in resource utilization
  • Provided expert consultation to clients in designing and implementing Azure Cosmos DB and Azure Data Explorer solutions
  • Designed scalable data ingestion pipelines, optimizing Cosmos DB throughput and partitioning for high-volume data
  • Developed real-time dashboards and visualizations using Azure Data Explorer, enabling quick insights into data trends
  • Collaborated with client teams to identify data integration opportunities, resulting in the consolidation of data sources for improved analysis
  • Conducted training sessions and workshops to educate client teams on best practices and advanced features of Azure data services
  • Managed a team of skilled data engineers, overseeing the design and implementation of data solutions
  • Set and enforced coding standards and best practices, ensuring consistent and high-quality data engineering processes
  • Collaborated closely with solution architects to design efficient data processing architectures, optimizing resource utilization
  • Led the adoption of Azure Data Factory and Databricks, enhancing data processing efficiency and capabilities
  • Provided technical guidance and coaching to team members, fostering continuous skill development
  • Implemented ETL processes to extract data from various sources, including REST APIs, databases, and streaming platforms
  • Created and maintained technical documentation and conducted knowledge-sharing sessions for client teams
  • Configured Spark Streaming to receive real-time data from the Apache Kafka and store the stream data to HDFS using Scala
  • Developed Spark Applications by using Kafka and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources
  • Created various data pipelines using Spark, Scala and SparkSQL for faster processing of data
  • Designed batch processing jobs using Apache Spark to increase speed compared to that of MapReduce jobs
  • Written Spark-SQL and embedded the SQL in SCALA files to generate jar files for submission onto the Hadoop cluster
  • Developed data pipeline using Flume to ingest data and customer histories into HDFS for analysis
  • Executing Spark SQL operations on JSON, transforming the data into a tabular structure using data frames, and storing and writing the data to Hive and HDFS
  • Worked with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing, and optimizing the HQL queries
  • Created hive tables as per requirement were Internal or External tables defined with appropriate static, dynamic partitions, and bucketing, intended for efficiency
  • Involved in moving all log files generated from various sources to HDFS for further processing through Kafka
  • Extracting real-time data using Kafka and Spark streaming by Creating DStreams and converting them into RDD, processing it, and stored it into
  • Used Spark SQL for Scala interface that automatically converts RDD case classes to schema RDD
  • Extracted source data from Sequential files, XML files, CSV files, transformed and loaded it into the target Data warehouse
  • Solid understanding of No SQL Database (MongoDB and Cassandra)
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala extracted large datasets from Cassandra and Oracle servers into HDFS and vice versa using Sqoop
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala
  • Involved in Migrating the platform from Cloudera to EMR platform
  • Developed analytical component using Scala, Spark and Spark Streaming
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and performed structural modifications using HIVE
  • Provided technical solutions on MS Azure HDInsight, Hive, HBase, MongoDB, Telerik, Power BI, Spot Fire, Tableau, Azure SQL Data Warehouse Data Migration Techniques using BCP, Azure Data Factory, and Fraud prediction using Azure Machine Learning

Azure Data Engineer

ZSoft Technologies
11.2016 - 06.2019
  • Collaborated with business user's/product owners/developers to contribute to the analysis of functional requirements
  • Implemented Spark SQL queries that combine hive queries with Python programmatic data manipulations supported by RDDs and data frames
  • Designed and developed data pipelines using Azure Data Factory, orchestrating data movement and transformations across sources
  • Collaborated with data analysts to understand data requirements and implemented optimized data processing workflows
  • Utilized Azure Databricks to process and transform extensive datasets, improving data processing speed and efficiency
  • Conducted troubleshooting and root cause analysis of data pipeline failures, implementing preventive measures
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS
  • Extract Real-time feed using Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data in HDFS
  • Developing Spark scripts, UDFS using Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop
  • Installed and configured Hadoop MapReduce HDFS Developed multiple MapReduce jobs in java for data cleaning and preprocessing
  • Installed and configured Pig and also written Pig Latin scripts
  • Wrote MapReduce job using Pig Latin
  • Worked on analyzing Hadoop clusters using different big data analytic tools including HBase database and Sqoop
  • Worked on importing and exporting data from Oracle, and DB2 into HDFS and HIVE using Sqoop for analysis, visualization, and generating reports
  • Creating and inserting data into Hive tables for dynamically inserting data into data tables using partitioning and bucketing for EDW tables and historical metrics
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations, and others during the ingestion process itself
  • Created ETL packages with different data sources (SQL Server, Oracle, Flat files, Excel, DB2, and Teradata) and loaded the data into target tables by performing different kinds of transformations using SSIS
  • Designed, developed data integration programs in a Hadoop environment with No SQL data store Cassandra for data access and analysis
  • Created partitions, bucketing across the state in Hive to handle structured data using Elastic search
  • Performed Sqooping for various file transfers through the HBase tables for processing of data to several No SQL DBs- Cassandra, Mongo DB

Data Analyst

IncraSoft Pvt Ltd
08.2013 - 10.2016
  • Involved in designing physical and logical data model using ERwin Data modeling tool
  • Designed the relational data model for operational data store and staging areas, Designed Dimension & Fact tables for data marts
  • Extensively used ERwin data modeler to design Logical/Physical Data Models, relational database design
  • Created Stored Procedures, Database Triggers, Functions and Packages to manipulate the database and to apply the business logic according to the user's specifications
  • Created Triggers, Views, Synonyms and Roles to maintain integrity plan and database security
  • Creation of database links to connect to the other server and access the required info
  • Integrity constraints, database triggers and indexes were planned and created to maintain data integrity and to facilitate better performance
  • Used Advanced Querying for exchanging messages and communicating between different modules
  • System analysis and design for enhancements Testing Forms, Reports and User Interaction

Education

Bachelor of Science - Computer Science

JNTU
Hyderabad, Telengana
05-2013

Skills

  • Git version control
  • ETL development
  • Big data processing
  • Python programming
  • NoSQL databases
  • Kafka streaming
  • Data modeling
  • API development
  • Data warehousing
  • Spark development
  • Machine learning
  • Advanced SQL
  • Java development
  • SQL and databases
  • Data analysis
  • Business intelligence

Timeline

Senior Data Engineer

Cigna
10.2022 - Current

Senior AWS Data Engineer

PNC Bank
02.2021 - 10.2022

Azure Data Engineer

The Home Depot
12.2019 - 02.2021

Azure Data Engineer

ZSoft Technologies
11.2016 - 06.2019

Data Analyst

IncraSoft Pvt Ltd
08.2013 - 10.2016

Bachelor of Science - Computer Science

JNTU
Nitish