Summary
Overview
Work History
Education
Skills
Timeline
Generic

Santosh Jena

Summary

19+ years of experience in Data Engineering, Data Analytics, and Cloud Technologies. Proven expertise in designing and optimizing data pipelines, ETL workflows, and analytical solutions to drive business insights. Skilled in Python, PySpark, AWS, and Hadoop, with a strong background in data warehousing, cloud migration, and DevOps. Adept at managing large-scale data processing, stakeholder engagement, and vendor management while ensuring high-performance, scalable, and cost-effective solutions. Passionate about leveraging data to enable decision-making and innovation.

Overview

19
19
years of professional experience

Work History

Senior Data Engineer

TABNER INC
04.2024 - Current
  • Company Overview: Client: Genentech
  • Developed and optimized AWS Glue-based ETL pipelines for processing large-scale pharmaceutical data, ensuring high performance and reliability.
  • Manipulated and analyzed large or complex data sets using relevant techniques.
  • Utilized AWS SageMaker for building and deploying machine learning models to support drug research, clinical trials, and patient analytics.
  • Designed and implemented Athena-based query solutions for real-time data analysis.
  • Automated data ingestion, transformation, and validation using Python and PySpark, enhancing efficiency and data accuracy.
  • Performed data validation and reconciliation to ensure the integrity and consistency of migrated data within AWS and on-premise environments.
  • Collaborated with cross-functional teams including data scientists, engineers, and business stakeholders to drive insights and improve operational efficiency.
  • Analyzed ETL processes to extract, transform, and load pharmaceutical data into AWS Redshift, S3

Senior Data Engineer

TABNER INC
10.2023 - 04.2024
  • Company Overview: Client: DICKS SPORTING GOODS
  • Analyzed sales data to identify opportunities for revenue growth and operational efficiency improvements.
  • Collaborated with cross-functional teams including data engineers, architects, and business stakeholders to ensure successful migration outcomes.
  • Designed and implemented data pipelines to extract, transform, and load data into Snowflake using Snowflake utilities and ETL tools.
  • Automated data extraction and reporting processes using Python scripts, reducing manual effort and improving efficiency.
  • Performing data validation and reconciliation to verify the integrity of migrated data and ensure consistency with source systems.
  • Analyzing ETL processes to extract, transform, and load data into Snowflake as part of Modernization.

Technical Lead

TABNER INC
03.2022 - 10.2023
  • Company Overview: Client: Apple
  • Partnered with cross-functional teams and business stakeholders to design and implement scalable data pipelines using AWS Glue and PySpark, enhancing data integration and processing efficiency .

• Designed and deployed end-to-end ETL workflows leveraging AWS Glue, S3, and Athena to support real-time and batch data processing, improving data availability for analytics and reporting.

• Developed reusable ETL scripts and automation logic to reduce manual intervention, streamline data ingestion processes, and ensure consistent data quality.

• Led optimization initiatives for AWS Glue jobs and Spark performance tuning, resulting in a 40% reduction in ETL processing time and improved job reliability.

• Collaborated with data analysts and data scientists to deliver curated datasets, enabling advanced analytics and machine learning model development.

• Implemented data quality checks and monitoring frameworks to proactively identify anomalies and bottlenecks in data pipelines.

• Delivered performance dashboards and operational KPIs using tools like QuickSight and Redshift, enabling data-driven decision-making for senior leadership.

• Conducted root cause analysis for data pipeline failures, driving the resolution of recurring issues and implementing long-term improvements in data architecture.

Data Engineering Manager

Accenture
11.2015 - 01.2022
  • Company Overview: Client: JP MORGAN
  • Designed and developed ETL processes and data pipelines to ingest, transform, and load data from various sources into data warehouses and data lakes.
  • Implemented real-time data streaming solutions using Apache Kafka and Spark Streaming for processing and analyzing streaming data.
  • Conducted performance tuning and optimization of data processing workflows to improve efficiency and reduce processing times.
  • Collaborated with data architects and database administrators to design and implement data models and schemas for optimal performance and scalability.
  • Provided technical guidance and support to junior data engineers, ensuring adherence to best practices and standards.
  • Implemented database migrations and data modeling using Django ORM to ensure data integrity and consistency.
  • Client: JP MORGAN

Technical Lead

WIPRO
06.2010 - 10.2015
  • Company Overview: Client: TIVO
  • Led the design and development of automated test frameworks using Python.
  • Implemented best practices for test automation, including framework architecture, coding standards, and documentation.
  • Collaborated with cross-functional teams to define test strategies, identify automation opportunities, and prioritize test coverage.
  • Developed and executed automated test scripts for functional, regression, and integration testing.
  • Integrated automated tests into CI/CD pipelines for continuous testing and deployment.
  • Mentored junior automation engineers and provided technical guidance and support.
  • Client: TIVO

Senior Software Engineer

ITC Infotech
01.2006 - 12.2009
  • Company Overview: Client: IBM
  • Scripted test cases in Python, Shell, and Perl to automate system configuration, deployment, and validation tasks.
  • Collaborated with AIX system administrators and developers to identify test scenarios, prioritize test coverage, and resolve issues.
  • Integrated automated tests into CI/CD pipelines for continuous testing and deployment of AIX-based applications.
  • Conducted root cause analysis and provided recommendations for improving AIX system reliability, security, and performance.
  • Stress testing and performance profiling to assess the scalability and responsiveness of AIX file systems under heavy load conditions.
  • Client: IBM

Education

Master in Computer Application -

BPUT
ODISHA, India

Skills

  • Data Engineering
  • Data Analyst
  • PySpark
  • Data Visualization
  • Python Programming
  • Hadoop Ecosystem
  • Data Integration
  • ETL Processes
  • Snowflake
  • Cloud Computing
  • Kubernetes
  • Docker
  • Django
  • Kafka
  • SQL
  • Oracle
  • GCP
  • AWS

Timeline

Senior Data Engineer

TABNER INC
04.2024 - Current

Senior Data Engineer

TABNER INC
10.2023 - 04.2024

Technical Lead

TABNER INC
03.2022 - 10.2023

Data Engineering Manager

Accenture
11.2015 - 01.2022

Technical Lead

WIPRO
06.2010 - 10.2015

Senior Software Engineer

ITC Infotech
01.2006 - 12.2009

Master in Computer Application -

BPUT
Santosh Jena