Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Additional Information
Timeline
Languages
Generic
Rijoy Sharma

Rijoy Sharma

Senior Data Engineer
Stamford,CT

Summary

Experienced Senior Data Engineer with 14+ years of expertise in building data-intensive applications. Skilled in detail-oriented analysis, solution design, development, testing, and implementation of data warehousing/BI and software applications. Collaborates closely with system architects, software architects, and design analysts to comprehend business or industry requirements and develop comprehensive data models. Possesses strong analytical skills, exceptional problem-solving abilities, and deep understanding of database technologies and systems. Equally adept at independent or collaborative work, leveraging excellent communication skills.

Overview

14
14
years of professional experience
4
4
years of post-secondary education
1
1
Certification

Work History

Senior Data Engineer

Tata Consultancy Services Limited
Norwalk/Hyderabad, Connecticut/India
06.2021 - Current
  • Redeveloped search engine marketing application in python, scala and spark sql and migrated to Google cloud service using Dataproc service for 200+ tables.
  • Architected and designed optimized data mart tables for leading online travel agency, reducing query processing time by 40% and improving data accessibility across 5+ departments, supporting faster business insights and reporting.
  • Build reconciliation code using Bigquery to test and troubleshoot migrated data. Enhanced performance by rearchitecting Scala code, achieving 95% reduction in execution time, optimizing data flow and computational efficiency.
  • Ensured data quality through rigorous testing, validation, and monitoring of all data assets, minimizing inaccuracies and inconsistencies.
  • Championed adoption of agile methodologies within the team, resulting in faster delivery times and increased collaboration among team members.

Senior Data Engineer

Tata Consultancy Services Limited
Bethesda, MD
7 2019 - 06.2021
  • Worked initially as a pilot project with real time streaming data processing of hotel reservation data using Apache Spark, Spark SQL and Kafka utilities and later re-implemented the project in batch mode processing using HIVE ETL queries in Hadoop platform
  • Automated the Reservation ETL batch process through Jenkins pipeline (later moved to Apache Airflow)
  • Implemented automatic code deployment approach through HDP Cloudbreak services that spun up the EC2 instances dynamically
  • The setup script will be triggered from HDP cloudbreak node through SSH to one of the EC2 instance driven by configuration file and pull the installer script and corresponding configuration file from GIT
  • It will then trigger the installer script to check if the required Hadoop services are installed and running on the connected EC2 instance, create folders in local file system, create schemas in HIVE, create HIVE objects followed by execution of ETL for processing the adobe clickstream data, writing data from and back to the S3 cluster and finally sending back the response to the cloudbreak node to free up the EC2 instance after the ETL completion
  • Developed script to move data from S3 bucket to HDFS location in EC2 instance and vice-versa through distcp mechanism
  • Incorporated functionality to copy only the partitioned data to S3 tables after each ETL processing
  • Worked in Hadoop Distributed File System to build the etl process and executed in a tuned approach through various performance tuning methodologies
  • Built python scripts to apply cleansed and augmented rules on the parsed JSON data and stored the PII data under encrypted zone in Hadoop
  • Designed reservation views to read data at different levels such as stay, night of stay, night of stay current and accommodation and pushed it to Analytical workspace through DISTCP to meet specific outbound data feeds and business analytics requirements
  • Developed complex analytical queries for execution on Athena through external tables created on top of S3 storage
  • Used CloudWatch Logs to monitor, store and access logfiles from EC2 instances
  • Used S3 Event trigger based mechanism integrated with lambda services with SNS and SQS in middle to notify alert emails to users for source files upload to S3 bucket
  • Ensured data quality through rigorous testing, validation, and monitoring of all data assets, minimizing inaccuracies and inconsistencies.
  • Reengineered existing ETL workflows to improve performance by identifying bottlenecks and optimizing code accordingly.

Data Engineer

Tata Consultancy Services Limited
Bethesda, MD
07.2017 - 06.2019
  • Solution design for migration of ETL applications from Netezza to Hadoop
  • Re architected existing shell scripts through converting it into standard python framework and finally integrated on Airflow DAG to create end-to-end data pipeline
  • Migrated successfully etl applications from Netezza to IBM Soft layer cloud and later to AWS
  • Involved in architectural solution designing to migrate both structured and semi-structured data to data lake in AWS
  • Redesigned Netezza SQLs to convert to Apache Hive and IBM Big SQL
  • Performed several tuning methodologies to get performance improvement in historical data conversion and incremental batch processing in AWS
  • Built robust process to handle historical digital data conversion for 11+ TBs of data (~15.5 billion records)
  • Extensively used python library functions such as pandas, numpy for data wrangling or munging activities on raw data sources landing to data lake
  • Also used Apache NiFi to implement few of long running applications

Big Data Engineer

Tata Consultancy Services Limited
Bethesda, Maryland
08.2015 - 06.2017
  • Involved in gathering of detailed requirement analysis and converted business specific details into technical requirements
  • Actively contributed working along with solution architect to design multi-touch attribution Model for one of Fortune 50 global companies – TTH industry’s Reservation Data Warehouse
  • Have got thorough understanding on Adobe web analytics data, eCommerce clickstream data and Netlink campaigns by involving in various ways of data analysis, data wrangling and data mining techniques
  • Designed data model for various dimensions, facts, aggregated and BI tables
  • Developed innovative solutions to design an attribution fact table, enhancing data accuracy and insights for analytics.
  • Worked extensively with advanced level SQLs to develop propagation logic
  • Have done value add to system by providing divergent solution to handle anomalous clickstream data received by Adobe SiteCatalyst tool
  • Developed optimized netezza queries involving fact tables (ranging upto 8 TB), 16 million records of clickstream data/day
  • Written complex Unix shell scripts handling restartability features, optimization, and code reusability and scheduled through Maestro jobs
  • Worked actively in all SDLC phases to make it to successful completion
  • Designed processes for receiving and loading international data from multiple upstream incorporating sftp, ftps, unix, nzsql, nzload techniques
  • Involved in conversion of SAS code to netezza code for Special Corporate Pricing Progress Report
  • Mentored offshore team to build robust code along with performance tuning approaches
  • Worked extensively on creation of test plan strategy and approaches
  • Led a team to successful project deployment, achieving zero post-implementation defects by ensuring rigorous testing and quality control.

Big Data Engineer

Tata Consultancy Services Limited
Bethesda, MD
09.2014 - 01.2015
  • Collaborated in re-architecting and enhancing existing Account Tracking Cognos reports by optimizing IBM DB2 queries, transitioning from Netezza to improve data accuracy and performance.
  • Enhanced report query performance by applying advanced DB2 tuning techniques, achieving 90-95% alignment with existing Netezza execution benchmarks.
  • Collaborated in developing and executing load testing methodologies for simultaneous report processing, ensuring system performance under high-demand conditions.
  • Developed test case scenarios and formulated comprehensive test plans, ensuring robust validation and alignment with project requirements.

Data Engineer

Tata Consultancy Services Limited
Bethesda, MD
02.2014 - 08.2014
  • Conducted detailed requirement analysis to fully understand business needs, delivering a BI table for the Email Marketing team to support strategic and timely campaign analysis. Developed interactive dashboards to visualize email campaign performance.
  • Implemented solutions for complex use cases utilizing advanced SQL techniques, streamlining data processing and analysis.
  • Developed ETL code for parsing email campaigns based on business rules
  • Developed and implemented Informatica workflows to efficiently load data from external sources, including .csv and .dat files, ensuring data integrity and streamlined processing.
  • Developed and implemented 50+ shell scripts to parse source files, performing complex data transformations to ensure accurate and efficient data processing.
  • Developed complex Netezza queries to implement data transposition, backpropagation logic, ensuring efficient data transformation and integration.
  • Involved in creating the test plan and test case scenarios

Data Engineer

Tata Consultancy Services Limited
03.2010 - 02.2014
  • Have experienced with troubleshooting features and performance tuning of existing applications
  • Have done multiple enhancements in automating processes through complex UNIX shell scripting, Perl and developing oracle procedures and packages
  • Involved in troubleshooting activities while working with Datastage functionalities
  • Involved in design and coding through Oracle Data Integrator ELT tool
  • Taken extensive role in performance tuning of oracle SQL's/ procedures
  • Involved in upgrade activities of Netezza twin fin v6 to v7 striper box and tested extensively 75+ etl applications of reservation data warehouse
  • Implemented ideas to make code reusability
  • Worked on defect logging and issue fixing through troubleshooting features
  • Worked with Autosys tool to define, schedule, and monitor job
  • Involved in data analysis and design documentation as well

Education

Bachelor of Technology - Electronics & Communication Engineering

West Bengal University of Technology
Durgapur, WB
01.2005 - 01.2009

Skills

Python Programming

Accomplishments

  • Awarded "Technical Excellence" for leading the Multi-touch Attribution project, recognizing innovative technical contributions.
  • Recipient of "Star of the Quarter" for exceptional work on the Email Reporting project, delivering impactful insights.
  • Recognized with "Technical Excellence" in 2013 for outstanding technical performance and project execution.
  • Honored as "Best Rising Star of the Team" in 2011 for exemplary growth and contributions to team success.

Certification

Google Cloud Certified Professional Machine Learning Engineer (certification ID: jzJRI6)

Additional Information

  • Ideathon Enthusiast
    Passionate about participating in ideathons, with experience in multiple events and notable wins in providing solution powered by AI- Machine learning models.
  • Full-Stack Data Science Proficiency: Experienced in the full data science pipeline, including data engineering, model building, and deploying solutions in simulated production environments.

Timeline

Google Cloud Certified Professional Machine Learning Engineer (certification ID: jzJRI6)

06-2023

Senior Data Engineer

Tata Consultancy Services Limited
06.2021 - Current

Data Engineer

Tata Consultancy Services Limited
07.2017 - 06.2019

Big Data Engineer

Tata Consultancy Services Limited
08.2015 - 06.2017

Big Data Engineer

Tata Consultancy Services Limited
09.2014 - 01.2015

Data Engineer

Tata Consultancy Services Limited
02.2014 - 08.2014

Data Engineer

Tata Consultancy Services Limited
03.2010 - 02.2014

Bachelor of Technology - Electronics & Communication Engineering

West Bengal University of Technology
01.2005 - 01.2009

Senior Data Engineer

Tata Consultancy Services Limited
7 2019 - 06.2021

Languages

English
Full Professional
Rijoy SharmaSenior Data Engineer