Summary
Overview
Work History
Education
Skills
Timeline
Generic

Thribhuvan Sai Tej D

Summary

Proficient IT professional with over 9 years of experience, specialized in Data Engineering, Big Data, Back-end Engineering, Ecosystem- Data Acquisition, Ingestion, Modelling, Storage Analysis, Integration, Data Processing. Extensive Experience in working with GCP/AWS Databricks, Synapse Analytics, Azure Data Factory, Stream Analytics, AWS/Azure Analysis Services, Data Lake, Azure Storage, Azure SQL Database, SQL Data Warehouse, Azure Cosmos DB. Expertise in working with Azure services like HDInsight, Application Insights, Azure Monitoring, Azure AD, Function apps, Logic apps, Event Hubs, Iot hubs, Storage Explorer, Key Vault. Strong working experience with SQL and NoSQL databases (Azure Cosmos DB, MongoDB, HBase, Cassandra), data modeling, tuning, disaster recovery, backup and creating data pipelines. Have extensive experience in creating pipeline jobs, schedule triggers using Azure dat. Have good experience designing cloud-based solutions in Azure by creating Azure SQL database, setting up Elastic pool jobs and designing tabular models in Azure analysis services. Strong knowledge in working with ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis. Experience in all phases of Data Warehouse development like requirements gathering, design, development, implementation, testing, and documentation. Acquired profound knowledge in developing production ready Spark applications utilizing Spark Core, Spark Streaming, Spark SQL, DataFrames, Datasets and Spark-ML. Expertise in building PySpark and Spark applications for interactive analysis, batch processing and stream processing. Extensively used Spark Data Frames API over Cloudera platform to perform analytics on Hive data and also used Spark Dataframe Operations to perform required Validations in the data. Strong Hadoop and platform support experience with all the entire suite of tools and services in major Hadoop Distributions – Cloudera, Azure HDInsight, AWS and Hortonworks. Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Airflow, Flume, MapReduce framework, Yarn. Skilled in using Azure authentication and authorization and experience in using Visualization tools like Tableau, Power BI. Basic hands-on experience working with Kusto. Strong knowledge in working with ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis. Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS). Working knowledge with Amazon S3, Amazon EC2, AWS Kinesis to provide a complete solution for computing, query processing, and storage across a wide range of applications. Experienced in working with micro batching to ingest millions of files on Snowflake cloud when files arrive at the staging area. Ingested data into Snowflake cloud data warehouse using Snowpipe. Good experience on maintaining version control using code versioning tools & Azure Devops. Skilled in creating dashboards in Power BI, Tableau and in Jupyter notebooks using (Matplotlib, Seaborn). Experience in configuring and monitoring data processing and data storage solutions. Experience in importing and exporting the data using Sqoop from HDFS to Relational Database Systems and from Relational Database Systems to HDFS. Workflows (Requirement study, Analysis, Design, Coding, Testing, Deployment, and Maintenance) in Event-driven and client/Server application development. Have expertise in Cloud Cost Management and Optimization techniques involved in projects.

Overview

10
10
years of professional experience

Work History

Data Engineer

Amazon
Austin, United States
10.2021 - Current
  • Led the development and enhancement of production-grade data ingestion and processing pipelines, meticulously fine-tuning functionalities to meet performance benchmarks and scalability requirements
  • Orchestrated the creation of lambdas using event driven architecture and state machines using step functions, revolutionizing purchase Order Integration and distribution to downstream microservices realized a remarkable 40% reduction in worker time and also containerized in docker.
  • Led the development and enhancement of production-grade data ingestion and processing pipelines, meticulously fine-tuning functionalities to meet performance benchmarks and scalability requirements.
  • Leveraged AWS Serverless Services such as S3, Athena, Lambda, SQS, SNS, Step Functions, Redshift, and Glue to design and implement scalable and cost-efficient solutions.
  • Orchestrated the creation of lambdas using event driven architecture and state machines using step functions, revolutionizing purchase Order Integration and distribution to downstream microservices realized a remarkable 40% reduction in worker time and also containerized in docker.
  • Acted as a liaison between unclear requirements and robust solutions, employing problem-solving acumen to design adaptable architectures aligning with evolving business needs.
  • Hands-on leadership in developing and refining data ingestion and processing pipelines utilizing Python, Databricks, and SnowSql, ensuring seamless data flow and processing efficiency.
  • Developed custom operators and hooks in Airflow to extend its functionality and integrate with various systems
  • Integrated Kafka with complementary tools like Spark, Flink, or other data processing frameworks for end-to-end data pipelines.

Data Engineer

CVS
United States, United States
01.2020 - 09.2021
  • Designed and implemented robust ETL pipelines on Azure data factory (ADF) to facilitate seamless data flow
  • Worked on data migration projects, ensuring smooth transition of data to GCP
  • Migrated ETL Azure Data Factory to Cloud Composer in GCP using Apache Airflow on GCP.
  • Designed and implemented robust ETL pipelines on Azure data factory (ADF) to facilitate seamless data flow.
  • Worked on data migration projects, ensuring smooth transition of data to GCP
  • Migrated ETL Azure Data Factory to Cloud Composer in GCP using Apache Airflow on GCP
  • Utilized Apache Beam for building scalable and parallelized data processing workflows on GCP.
  • Developed end-to-end data pipelines using Azure Data Factory to extract, transform, and load (ETL) data from diverse sources into Azure Data Lake Storage.
  • Implemented efficient data ingestion strategies in Azure Data Factory, optimizing data flow from on-premises and cloud-based sources.
  • Designed and configured Azure Data Factory pipelines, ensuring scalability and reliability for real-time and batch data processing.
  • Developed and maintained data lakes on GCP, optimizing storage and retrieval of structured and unstructured data.
  • Validate data correctness and application functionality in the GCP environment.
  • Leveraged Google BigQuery for high-performance SQL queries and analytics on large datasets.
  • Conduct thorough testing of data pipelines and workflows.
  • Implemented data validation and cleansing techniques to ensure data accuracy and consistency for reporting and analytics purposes.
  • Engineered solutions within Databricks to perform data cleansing, aggregation, and feature engineering for predictive modeling.
  • Integrated Delta Lake architecture in Databricks for ACID transactions, enabling robust and reliable data processing.
  • Conducted performance tuning and optimization of Spark jobs in Databricks for enhanced processing efficiency.
  • Proficient in setting up and managing data replication processes within the Azure data ecosystem. Skilled in utilizing Azure services like Azure Data Factory and Azure SQL Data Sync to replicate and synchronize data across multiple databases or data stores, ensuring consistency and availability for real-time analytics, disaster recovery for distributed applications.
  • Collaborated with cross-functional teams including data scientists and business analysts to understand data requirements and deliver optimal solutions.

Data Analyst/Data Engineer

Client: Abbvie
India, India
03.2018 - 01.2020
  • Implemented data connectors and APIs to facilitate seamless integration with various data systems, ensuring reliable and timely ingestion of structured, semi-structured, and unstructured data
  • Loaded and transformed large sets of structured, semi-structured, and unstructured data using Hadoop/Big Data concepts
  • Conducted a comprehensive assessment of the source dataset, including data structure, quality, and dependencies, to determine the optimal ETL strategy for the migration process.
  • Implemented data connectors and APIs to facilitate seamless integration with various data systems, ensuring reliable and timely ingestion of structured, semi-structured, and unstructured data.
  • Designed and developed custom PowerApps solutions tailored to specific business needs, streamlining processes and enhancing user experience.
  • Developed and optimized QuickBase applications, leveraging its functionalities to create robust systems for data management and process automation.
  • Loaded and transformed large sets of structured, semi-structured, and unstructured data using Hadoop/Big Data concepts.
  • Conducted a comprehensive assessment of the source dataset, including data structure, quality, and dependencies, to determine the optimal ETL strategy for the migration process.
  • Utilized ETL tools like Informatica and SSIS to extract data from legacy systems, transform it based on business rules, and load it into the target system, ensuring minimal disruption to ongoing operations and data integrity.
  • Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.
  • Orchestrated seamless integration between PowerApps, SharePoint, and QuickBase, ensuring a cohesive ecosystem for data flow and process automation.

Data Analyst/Data Engineer

Odessa Technologies.
Bengaluru, India
06.2014 - 02.2018
  • Actively participated in Sprint planning sessions and engaged in daily Agile SCRUM meetings, contributing to the collaborative development process
  • Led the project lifecycle, from initial requirement gathering to the comprehensive development of the entire application
  • Set up and programmed within the Anaconda Python Environment, demonstrating proficiency in its creation and activation.
  • Actively participated in Sprint planning sessions and engaged in daily Agile SCRUM meetings, contributing to the collaborative development process.
  • Led the project lifecycle, from initial requirement gathering to the comprehensive development of the entire application.
  • Set up and programmed within the Anaconda Python Environment, demonstrating proficiency in its creation and activation.
  • Developed programs for performance calculations utilizing NumPy and SQLAlchemy, ensuring efficient and accurate computations.
  • Involved in data analysis and creating data mapping documents to capture source to target transformation rules.
  • Creating or modifying the T-SQL queries as per the business requirements and working on creating role playing dimensions, fact-less Fact, snowflake and star schemas.
  • Wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
  • Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Employed packages such as Beautiful Soup for data parsing and contributed to the development of web services using SOAP for seamless data exchange in XML format.

Education

Bachelor of Engineering in Computer Science -

SRM UNIVERSITY
01.2014

Skills

  • Data Analysis
  • SQL and Databases
  • NoSQL Databases
  • Database Development
  • Data Warehousing
  • Software Development Life Cycle (SDLC)

Timeline

Data Engineer

Amazon
10.2021 - Current

Data Engineer

CVS
01.2020 - 09.2021

Data Analyst/Data Engineer

Client: Abbvie
03.2018 - 01.2020

Data Analyst/Data Engineer

Odessa Technologies.
06.2014 - 02.2018

Bachelor of Engineering in Computer Science -

SRM UNIVERSITY
Thribhuvan Sai Tej D