Summary
Overview
Work History
Education
Skills
Timeline
Generic

SHENTAN

Sr Data Engineer

Summary

A highly skilled database professional with a decade of experience in monitoring and optimizing database environments. Demonstrates a comprehensive understanding of database technologies and systems, complemented by strong analytical skills and excellent problem-solving abilities. Has a proven track record in troubleshooting complex database issues. Equally adept at working independently and collaborating within team settings, leveraging exceptional communication skills. Committed to continuous learning and improvement, consistently delivering high-quality results in fast-paced, demanding environments. An organized and dependable candidate successful at managing multiple priorities with a positive attitude. Willing to take on added responsibilities to meet team goals.

Overview

12
12
years of professional experience

Work History

Sr.Data Engineer

CIBC
8 2023 - Current
  • Enhanced system performance by designing and implementing scalable data solutions for high-traffic applications.
  • Championed the adoption of agile methodologies within the team, resulting in faster delivery times and increased collaboration among team members.
  • Managed and organized data analytics projects within the Azure Databricks Lakehouse Platform.
  • Developed interactive Notebooks using Spark SQL, PySpark for data analysis and visualizations.
  • Configured and managed Azure Databricks Clusters for data processing, optimizing for performance and cost efficiency.
  • Developed and optimized data pipelines to import data from various sources to the data catalog and Hive metastore.
  • Utilized Event Hubs to capture and ingest streaming data from a variety of sources.
  • Used Stream Analytics to process streaming data in real time and generate output streams.
  • Automated the execution of notebooks, scripts, and applications using Databricks Workflows.
  • Integrated Azure Databricks with Azure Data Lake Storage (ADLS) for secure data storage and management.
  • Implemented data import and export processes between Azure Databricks and Snowflake.
  • Leveraged the Delta Lake Time Travel feature for versioned schema rollbacks and historical data restoration.
  • Optimized Delta Lake tables and query performance through liquid clustering.
  • Utilized Azure DevOps for implementing CI/CD for data engineering workloads in Azure Databricks.
  • Integrated Snowflake and Databricks ecosystems via connectors for seamless data transfer.
  • Designed and executed end-to-end data warehousing solutions on Snowflake platform, including the creation of dynamic tables for intricate data processing workflows.

Sr.Data Engineer

LinkedIn/HCL
02.2022 - 08.2023
  • Led the modernization of LinkedIn's computing infrastructure, transitioning from legacy systems to Spark and Trino, resulting in improved efficiency and integration.
  • Developed Spark infrastructure, user libraries, and tooling, directly impacting user experience.
  • Fostered a culture of knowledge and efficiency through daily interactions and advocacy for best Spark practices.
  • Demonstrated the impact of optimizations through compelling visualizations, providing clear insights to stakeholders.
  • Designed and built a Jenkins-based CI/CD pipeline for continuous monitoring and project tracking.
  • Employed Grafana and LinkedIn's Observe platform for comprehensive data flow visualization and system metrics analysis.
  • Encouraged evidence-based decisions and practices, enhancing the team's analytical capabilities.
  • Achieved substantial resource savings and reduced operational expenses by approximately $4.27 million.
  • Managed over 3000 migration tickets, demonstrating strong project management skills.
  • Leveraged Azure Data Lake and data warehousing concepts to design scalable data storage and retrieval solutions.
  • Actively participated in code reviews, promoting adherence to coding standards and knowledge sharing within the team.

Sr.Data Engineer

Tyler Technologies Inc
02.2018 - 01.2022
  • Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple data formats, enhancing data processing capabilities.
  • Designed and implemented ETL pipelines for petabyte-scale datasets using AWS Glue, resulting in a 50% reduction in data processing time and improved data consistency.
  • Utilized AWS Glue DataBrew for data cleansing and transformation, significantly improving data quality and accuracy.
  • Developed and deployed AWS Lambda functions for real-time data processing, increasing efficiency by 30%.
  • Designed a highly scalable data warehousing solution using Amazon Redshift, enabling efficient storage and analysis of large volumes of structured data.
  • Implemented complex transformations on streaming data using AWS Glue and AWS Lambda, improving real-time data processing speed by 40%.
  • Built a data pipeline using S3 and AWS Data Pipeline for seamless data integration, reducing manual intervention.
  • Implemented PySpark-based data quality checks and automated error handling, ensuring data consistency and accuracy.
  • Designed scalable and maintainable data models to support business intelligence initiatives and reporting needs.
  • Conducted extensive troubleshooting to identify root causes of issues and implement effective resolutions in a timely manner.

Data Engineer

Assurant
02.2016 - 02.2018
  • Designed and implemented a CI/CD pipeline utilizing Gitlab/Jenkins, ensuring efficient and seamless deployment of code changes.
  • Utilized Azure Databricks and Azure HDInsight to build, deploy, and manage Spark and Hadoop clusters for large-scale data processing and analytics tasks.
  • Developed and optimized data processing jobs for distributed computing platforms like Apache Hadoop and Apache Spark, resulting in faster processing times and efficient resource utilization.
  • Conducted performance tuning and optimization of Python scripts, leveraging techniques like caching, indexing, and parallelization to enhance data processing efficiency.
  • Developed and deployed machine learning models using Databricks ML flow and frameworks such as Scikit-learn and TensorFlow, contributing to predictive analytics capabilities.
  • Implemented real-time data processing and streaming solutions using Databricks Structured Streaming and Kafka, enhancing the organization's ability to handle real-time data.
  • Leveraged Agile development practices in data engineering workflows, including source control management, testing, and deployment automation, promoting a more efficient and collaborative work environment.

Data Engineer

Service Corporation International (SCI)
06.2015 - 02.2016
  • Worked on Big Data Integration and Analytics based on Hadoop, Spark, SparkSQL and NOSQL Database

Software Developer (ETL)

AffiliateVIA Online Media Pvt. Ltd.
06.2012 - 07.2013
  • Managed and maintained Hadoop clusters, including tasks such as cluster monitoring, troubleshooting, and data backup management, ensuring optimal performance and data integrity.
  • Orchestrated data workflows using Oozie, enhancing the efficiency and reliability of data processing tasks.
  • Imported data from various RDBMS servers to HDFS using Sqoop, facilitating seamless data integration and enabling further ETL operations.
  • Developed Hive queries to assist market analysts in identifying emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Collaborated in an Agile environment, ensuring high-quality deliverables with monthly iterations, and utilized Git for version control and team collaboration.
  • Optimized application performance by conducting regular code reviews and refactoring when necessary.
  • Created comprehensive documentation detailing software functionality for future reference or maintenance purposes.

Education

Master of Science - Computer Science

University of Central Missouri
Warrensburg, MO
01.2013 - 2015.05

Bachelor's - undefined

Kakatiya University
Warangal, India
01.2008 - 2012.05

Skills

  • Python
  • Scala
  • Nodejs
  • Unix Shell scripting
  • Apache Hadoop
  • PySpark
  • Apache Kafka
  • Apache Avro
  • Oozie
  • HDFS
  • HIVE
  • HBase
  • Amazon Athena
  • AWS Batch
  • Amazon CloudWatch
  • AWS CodeDeploy
  • AWS Data Pipeline
  • EC2
  • Amazon EMR
  • AWS Glue
  • AWS Lake Formation
  • AWS Lambda
  • Amazon Redshift
  • Amazon RDS
  • Amazon S3
  • Amazon Kinesis
  • Amazon Quicksight
  • Ansible
  • Docker
  • Git Hub
  • JIRA
  • Jenkins
  • Terraform
  • ETL development

  • Data Warehousing

  • Data Security

  • Performance Tuning

  • Java

  • Data Modeling

  • Real-time Analytics

  • Advanced SQL

  • NoSQL Databases

  • Big Data Processing

Timeline

Sr.Data Engineer

LinkedIn/HCL
02.2022 - 08.2023

Sr.Data Engineer

Tyler Technologies Inc
02.2018 - 01.2022

Data Engineer

Assurant
02.2016 - 02.2018

Data Engineer

Service Corporation International (SCI)
06.2015 - 02.2016

Master of Science - Computer Science

University of Central Missouri
01.2013 - 2015.05

Software Developer (ETL)

AffiliateVIA Online Media Pvt. Ltd.
06.2012 - 07.2013

Bachelor's - undefined

Kakatiya University
01.2008 - 2012.05

Sr.Data Engineer

CIBC
8 2023 - Current
SHENTAN Sr Data Engineer