Summary

Overview

Work History

Education

Skills

Timeline

Nitish

Summary

Senior engineering professional with deep expertise in data architecture, pipeline development, and big data technologies. Proven track record in optimizing data workflows, enhancing system efficiency, and driving business intelligence initiatives. Strong collaborator, adaptable to evolving project demands, with focus on delivering impactful results through teamwork and innovation. Skilled in SQL, Python, Spark, and cloud platforms, with strategic approach to data management and problem-solving. Experienced leader with strong background in guiding teams, managing complex projects, and achieving strategic objectives. Excels in developing efficient processes, ensuring high standards, and aligning efforts with organizational goals. Known for collaborative approach and commitment to excellence.

Overview

years of professional experience

Work History

Senior Data Engineer

Cigna

10.2022 - Current

Designed, implemented, and maintained real-time and batch data processing pipelines using Azure Databricks and Apache Spark
Collaborated with data scientists and analysts to understand data requirements and transformed them into scalable solutions
Led end-to-end data engineering initiatives using Azure Databricks and Azure Data Factory, delivering high-quality data solutions
Designed, developed, and maintained Azure Cosmos DB and Azure Data Explorer solutions for high-throughput data ingestion and real-time analytics
Architected data models and partitioning strategies in Azure Cosmos DB, resulting in enhanced performance and reduced costs
Collaborated with data scientists and analysts to develop optimal query patterns, ensuring efficient data retrieval for reporting and analysis
Implemented custom connectors and integrations to ingest data from various sources, enhancing the overall data ecosystem
Provided technical leadership and guidance to junior data engineers, fostering a culture of innovation and knowledge sharing
Designed and implemented complex ETL pipelines for batch and real-time processing, ensuring data accuracy and integrity
Collaborated closely with data scientists and analysts to understand data needs and translate them into technical requirements
Optimized data storage solutions with Azure Data Lake Storage and Azure SQL Data Warehouse for improved performance
Provided technical leadership, mentoring, and training to junior team members
Implemented data partitioning and optimization strategies, resulting in a 30% reduction in pipeline processing time
Developed custom UDFs (User-Defined Functions) to perform complex transformations and data enrichment tasks
Implemented data lineage tracking and documentation, ensuring data governance and compliance with industry standards
Led the adoption of Terraform for infrastructure provisioning, reducing deployment time by 50% and ensuring consistency across multiple environments
Designed and implemented complex Azure, or GCP cloud architectures using Terraform modules to support scalable and resilient applications
Collaborated with development and operations teams to establish best practices for Terraform code organization and versioning using Git
Automated the creation of CI/CD pipelines for Terraform code validation and deployment, enhancing the development workflow
Contributed to the development of a real-time fraud detection system utilizing Azure Databricks and Spark Streaming
Participated in the creation of data ingestion pipelines for high-velocity streaming data processing
Engaged in code reviews and collaborated with senior engineers to enhance code quality and performance
Designed and executed automated test suites to ensure the reliability and accuracy of data processing workflows
Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production
Queried both Managed and External tables created by Hive using Impala
Involved with extraction of large volumes of data and analysis of complex business logics; to derive business-oriented insights and recommending/proposing new solutions to the business in Excel Report
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
Encoded and decoded objects using PySpark to create and modify the data frames in Apache Spark
Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS)
Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python
Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
Designed, developed, and managed Power BI, Tableau, QlikView, Qlik Sense Apps including Dashboard, Reports, Storytelling
Created a new Power BI reports dashboard with 13 pages according to the design spec in two weeks beating the tight timeline
Deployed an automation to production for update the company holiday schedule based on company's holiday policy which need to be updated yearly
Used Informatica Power Center for extraction, transformation, and loading (ETL) of data in the data warehouse
Loading data into Snowflake tables from internal stage using SnowSQL
Prepared data warehouse using Star/Snowflake schema concepts in Snowflake using SnowSQL
Prepared Tableau reports and dashboards with calculated fields, parameters, sets, groups or bins and publish on the server

Senior AWS Data Engineer

PNC Bank

02.2021 - 10.2022

Developed Hive ETL Logic for data cleansing and transformation of data coming through RBMS
Utilized Pyspark and Spark-SQL to develop Spark applications for data extraction, transformation, and aggregation, leading to insights into customer usage patterns
Analyze SQL scripts and design the solution to implement using Pyspark
Designed and implemented data warehousing solutions using Snowflake, leveraging its unique multi-cluster architecture for scalable and elastic data processing
Experienced in performance tuning of Spark applications to optimize batch interval time, parallelism, and memory usage for enhanced efficiency
AWS CI/CD Data pipeline and AWS Data Lake using EC2, AWS Glue, AWS Lambda
Implemented data quality checks and validation processes within Databricks to ensure the accuracy and reliability of data
Developed Scala and Pyspark User-Defined Functions (UDFs) to meet specific business requirements
Implemented kinesis data streams to read real time data and loaded into data s3 for downstream processing
Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake
Implemented data validation and testing frameworks within DBT to ensure high data quality and integrity
Creating AWS Lambda functions using python for deployment management in AWS and designed, investigated and implemented public facing websites on AWS and integrated it with other applications infrastructure
Implemented NiFi pipelines to export data from HDFS to cloud locations like AWS
Proficient in developing SQL scripts for automation purposes, contributing to improved workflow efficiency
Implemented metadata management and data cataloging solutions within Databricks to enhance data discoverability and lineage tracking
Worked with Visual Studio Team Services (VSTS) to create build and release processes for multiple projects in a production environment
Created ETL framework to hydrate the data lake using pyspark
Collaborated with the Big Data Architecture Team to lay the foundation for an Enterprise Analytics initiative in a Hadoop-based Data Lake
Developed and implemented Machine Learning algorithms for predictive modeling
Extensively utilized SSIS transformations for data integration and processing
Extracted data from various APIs, data cleansing and processing by using Java and Scala
Worked in different parts of data lake implementations and maintenance for ETL processing
Created and managed ETL processes using SnowSQL, Snowflake's SQL-based ETL tool, to extract, transform, and load data from various source systems into Snowflake data warehouses
Developed Spark applications in Java (Spark) on distributed environments, efficiently loading CSV files into Hive ORC tables
Developed Spark jobs using Scala and Python on Yarn for interactive and batch analysis
Developed data pipelines using Kafka, Spark, and Hive for data ingestion, transformation, and analysis
Collaborated with systems engineering teams to plan and deploy new Hadoop environments and expand existing Hadoop clusters

Azure Data Engineer

The Home Depot

12.2019 - 02.2021

Created Spark jobs by writing RDDs in Python and created data frames in Spark SQL to perform data analysis and stored in Azure Data Lake
Provided expert guidance to clients in architecting, developing, and deploying data solutions on Azure Databricks and HDInsight
Conducted performance tuning and optimization of existing Spark jobs, resulting in a 25% reduction in resource utilization
Provided expert consultation to clients in designing and implementing Azure Cosmos DB and Azure Data Explorer solutions
Designed scalable data ingestion pipelines, optimizing Cosmos DB throughput and partitioning for high-volume data
Developed real-time dashboards and visualizations using Azure Data Explorer, enabling quick insights into data trends
Collaborated with client teams to identify data integration opportunities, resulting in the consolidation of data sources for improved analysis
Conducted training sessions and workshops to educate client teams on best practices and advanced features of Azure data services
Managed a team of skilled data engineers, overseeing the design and implementation of data solutions
Set and enforced coding standards and best practices, ensuring consistent and high-quality data engineering processes
Collaborated closely with solution architects to design efficient data processing architectures, optimizing resource utilization
Led the adoption of Azure Data Factory and Databricks, enhancing data processing efficiency and capabilities
Provided technical guidance and coaching to team members, fostering continuous skill development
Implemented ETL processes to extract data from various sources, including REST APIs, databases, and streaming platforms
Created and maintained technical documentation and conducted knowledge-sharing sessions for client teams
Configured Spark Streaming to receive real-time data from the Apache Kafka and store the stream data to HDFS using Scala
Developed Spark Applications by using Kafka and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources
Created various data pipelines using Spark, Scala and SparkSQL for faster processing of data
Designed batch processing jobs using Apache Spark to increase speed compared to that of MapReduce jobs
Written Spark-SQL and embedded the SQL in SCALA files to generate jar files for submission onto the Hadoop cluster
Developed data pipeline using Flume to ingest data and customer histories into HDFS for analysis
Executing Spark SQL operations on JSON, transforming the data into a tabular structure using data frames, and storing and writing the data to Hive and HDFS
Worked with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing, and optimizing the HQL queries
Created hive tables as per requirement were Internal or External tables defined with appropriate static, dynamic partitions, and bucketing, intended for efficiency
Involved in moving all log files generated from various sources to HDFS for further processing through Kafka
Extracting real-time data using Kafka and Spark streaming by Creating DStreams and converting them into RDD, processing it, and stored it into
Used Spark SQL for Scala interface that automatically converts RDD case classes to schema RDD
Extracted source data from Sequential files, XML files, CSV files, transformed and loaded it into the target Data warehouse
Solid understanding of No SQL Database (MongoDB and Cassandra)
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala extracted large datasets from Cassandra and Oracle servers into HDFS and vice versa using Sqoop
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala
Involved in Migrating the platform from Cloudera to EMR platform
Developed analytical component using Scala, Spark and Spark Streaming
Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and performed structural modifications using HIVE
Provided technical solutions on MS Azure HDInsight, Hive, HBase, MongoDB, Telerik, Power BI, Spot Fire, Tableau, Azure SQL Data Warehouse Data Migration Techniques using BCP, Azure Data Factory, and Fraud prediction using Azure Machine Learning

Azure Data Engineer

ZSoft Technologies

11.2016 - 06.2019

Collaborated with business user's/product owners/developers to contribute to the analysis of functional requirements
Implemented Spark SQL queries that combine hive queries with Python programmatic data manipulations supported by RDDs and data frames
Designed and developed data pipelines using Azure Data Factory, orchestrating data movement and transformations across sources
Collaborated with data analysts to understand data requirements and implemented optimized data processing workflows
Utilized Azure Databricks to process and transform extensive datasets, improving data processing speed and efficiency
Conducted troubleshooting and root cause analysis of data pipeline failures, implementing preventive measures
Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS
Extract Real-time feed using Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data in HDFS
Developing Spark scripts, UDFS using Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop
Installed and configured Hadoop MapReduce HDFS Developed multiple MapReduce jobs in java for data cleaning and preprocessing
Installed and configured Pig and also written Pig Latin scripts
Wrote MapReduce job using Pig Latin
Worked on analyzing Hadoop clusters using different big data analytic tools including HBase database and Sqoop
Worked on importing and exporting data from Oracle, and DB2 into HDFS and HIVE using Sqoop for analysis, visualization, and generating reports
Creating and inserting data into Hive tables for dynamically inserting data into data tables using partitioning and bucketing for EDW tables and historical metrics
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations, and others during the ingestion process itself
Created ETL packages with different data sources (SQL Server, Oracle, Flat files, Excel, DB2, and Teradata) and loaded the data into target tables by performing different kinds of transformations using SSIS
Designed, developed data integration programs in a Hadoop environment with No SQL data store Cassandra for data access and analysis
Created partitions, bucketing across the state in Hive to handle structured data using Elastic search
Performed Sqooping for various file transfers through the HBase tables for processing of data to several No SQL DBs- Cassandra, Mongo DB

Data Analyst

IncraSoft Pvt Ltd

08.2013 - 10.2016

Involved in designing physical and logical data model using ERwin Data modeling tool
Designed the relational data model for operational data store and staging areas, Designed Dimension & Fact tables for data marts
Extensively used ERwin data modeler to design Logical/Physical Data Models, relational database design
Created Stored Procedures, Database Triggers, Functions and Packages to manipulate the database and to apply the business logic according to the user's specifications
Created Triggers, Views, Synonyms and Roles to maintain integrity plan and database security
Creation of database links to connect to the other server and access the required info
Integrity constraints, database triggers and indexes were planned and created to maintain data integrity and to facilitate better performance
Used Advanced Querying for exchanging messages and communicating between different modules
System analysis and design for enhancements Testing Forms, Reports and User Interaction

Education

Bachelor of Science - Computer Science

JNTU

Hyderabad, Telengana

05-2013

Skills

Git version control
ETL development
Big data processing
Python programming
NoSQL databases
Kafka streaming
Data modeling
API development

Data warehousing
Spark development
Machine learning
Advanced SQL
Java development
SQL and databases
Data analysis
Business intelligence

Timeline

Senior Data Engineer

Cigna

10.2022 - Current

Senior AWS Data Engineer

PNC Bank

02.2021 - 10.2022

Azure Data Engineer

The Home Depot

12.2019 - 02.2021

Azure Data Engineer

ZSoft Technologies

11.2016 - 06.2019

Data Analyst

IncraSoft Pvt Ltd

08.2013 - 10.2016

Bachelor of Science - Computer Science

JNTU

Nitish

Summary

Overview

Work History

Senior Data Engineer

Senior AWS Data Engineer

Azure Data Engineer

Azure Data Engineer

Data Analyst

Education

Bachelor of Science - Computer Science

Skills

Timeline

Senior Data Engineer

Senior AWS Data Engineer

Azure Data Engineer

Azure Data Engineer

Data Analyst

Bachelor of Science - Computer Science

Similar Profiles

Rudee FriarRudee Friar

Christina BrooksChristina Brooks

Allisa BlackAllisa Black

Megan HoffeditzMegan Hoffeditz

Landon HeruskaLandon Heruska