Summary
Overview
Work History
Education
Skills
Certification
Timeline
Personal Information
Generic

Sai Teja Chintana

Irving

Summary

Data Engineer with over 5 years of experience in data engineering and modeling, specializing in business intelligence, data warehousing, ETL processes, and leveraging cloud and big data technologies. Expertise in PySpark for complex data processing and the development of efficient ETL pipelines, along with strong proficiency in performance optimization for Snowflake and management of database structures. Proven track record of migrating SQL databases to Azure services, ensuring seamless transitions from on-premises environments, and utilizing AWS and Azure cloud services such as S3, Glue, Redshift, Data Factory, and Synapse Analytics. Demonstrated ability to lead projects through the entire lifecycle from design to deployment within Agile frameworks, with hands-on experience in workflow scheduling using Apache Airflow, automating deployments with Jenkins and Docker, and conducting code reviews via GitHub to enhance quality and facilitate knowledge sharing.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Data Engineer

UBS
12.2023 - Current
  • Developed robust ETL pipelines using PySpark to extract, transform, and load data from various source systems into centralized data warehouses, ensuring data integrity, consistency, and compliance.
  • Designed and deployed reports and dashboards in Power BI Service.
  • Led data warehouse migration from Netezza to AWS Redshift.
  • Implemented data integration workflows using IBM DataStage, orchestrating data movement between DataStage and Snowflake.
  • Managed and optimized databases including MySQL, SQL Server, and PostgreSQL.
  • Developed Python scripts with AWS Lambda to ingest data into Snowflake.
  • Built Apache Spark jobs using Scala for efficient data processing and utilized SparkSQL for querying.
  • Applied PySpark's advanced transformation capabilities to perform complex data operations, including joins, aggregations, and pivoting.
  • Optimized Snowflake storage using micro-partitioning and automatic clustering to enhance query performance.
  • Configured PySpark to load data from AWS S3 to Snowflake using necessary libraries and credentials.
  • Developed Scala code for extracting and transforming cloud-based data.
  • Integrated Amazon Lex with AWS services such as Lambda, DynamoDB, and S3 for backend processing and data storage.
  • Built data pipelines for data ingestion from SQL Server to S3.
  • Implemented CI/CD pipelines using Docker, Jenkins, and GitHub, ensuring automated deployments for development and testing environments.
  • Created tables, views, and user-defined functions in Snowflake Cloud Data Warehouse.
  • Collaborated with Snowflake and PySpark to develop effective data models and provide insights throughout project sprints.
  • Utilized Apache Airflow for workflow scheduling and monitoring.
  • Experienced in the entire project lifecycle, including design, development, and deployment.
  • Conducted code reviews via GitHub to improve quality and knowledge sharing.
  • Automated deployments using Jenkins and Docker.
  • Debugged projects using JIRA within an Agile framework.
  • Familiar with development environments like JIRA, Rally, and GitHub.
  • Tools and Technologies: PySpark, AWS, AmazonConnect, Git, Jenkins, Snowflake, SQL, Python, AWS Lambda, S3, EC2, EMR, Redshift.

Data Engineer

Citrix
12.2022 - 06.2023
  • Reengineered and implemented scalable data solutions utilizing Python, SQL, and Apache Spark to fulfill intricate financial data processing and analytical needs.
  • Developed ETL pipelines leveraging Informatica PowerCenter and DBT to efficiently ingest, transform, and load financial data from diverse sources into centralized data warehouses.
  • Executed Apache Beam Dataflow jobs within Python to facilitate historical data loads into BigQuery tables.
  • Engineered multiple programs using Python and Apache Beam, executing them in Cloud Dataflow to validate data integrity between raw source files and BigQuery tables.
  • Designed and developed Apache Spark jobs in Scala within a test environment to expedite data processing, employing SparkSQL for advanced querying.
  • Utilized Python scripting within DAGs to automate task execution, configured email notifications for DAG updates, and ensured workflow integrity by validating task completion with 'SUCCESS' status.
  • Configured Zookeeper and implemented Hadoop High Availability with Zookeeper failover controller, adding Scala support to establish a fault-tolerant data solution.
  • Aggregated data from multiple sources and created interactive dashboards using Power BI for insightful reporting.
  • Expertly applied Python libraries such as Pandas and NumPy to analyze and transform diverse raw file formats, including JSON, CSV, XML, and RRF.
  • Specialized in reading data from Azure SQL Database tables and publishing it to an Azure Service Bus topic. Seamlessly processed and loaded structured and unstructured data from Azure Service Bus to Azure SQL Database using Azure Data Factory with Python, including Service Bus topic creation and configuration.
  • Executed multiple MapReduce operations utilizing PySpark and NumPy, while integrating Jenkins for continuous deployment and integration.
  • Developed Spark jobs to extract data from Hive tables and process it using Dataproc. Leveraged HiveQL to analyze partitioned and bucketed data, executing queries on Parquet tables.
  • Monitored BigQuery, Dataproc, and Cloud Dataflow jobs across all environments using Stackdriver.
  • Authored Python scripts to extract ZIP files from the NIH website, update datasets monthly, apply necessary filters, perform SQL joins, and output results in JSON format.
  • Enhanced centralized logging, visualization, and monitoring capabilities using Azure Data Factory and Azure Monitor.
  • Constructed DAX queries in Power BI to generate computed columns and optimize report functionalities.
  • Tools and Technologies: : PySpark, Azure Data Factory, Scala, Azure Data Lake Storage, Azure Event Hubs, Azure Databricks, Git, Spark. Hive, Python (Pandas, NumPy, TensorFlow, Matplotlib), Azure Data Factory.

Jr.Data Engineer

OptumGlobalSolution
09.2020 - 12.2021
  • Assisted in the design and development of data solutions using PySpark, ApacheSpark, and AzureDataFactory to address retail-specific data processing and analysis requirements.
  • Collaborated with senior team members to understand requirements and contributed to the implementation of ETL pipelines for data extraction, transformation, and loading.
  • Assisted in database management tasks including data modeling, optimization, and maintenance using tools like Hive and Azure Synapse Analytics.
  • Assisted in the optimization of ETL processes using Azure Data Factory, under the guidance of senior team members, to ensure efficient data processing and loading.
  • Participated in DevOps practices including version control using Git and continuous integration and deployment using Azure DevOps.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Collaborated with retail analysts, merchandisers, and marketing teams to understand data requirements and provided assistance in delivering insights-driven solutions.
  • Supported integration efforts with IT and operations teams to ensure seamless data exchange and interoperability with retail systems and workflows.
  • Assisted in monitoring the performance of data solutions and identifying issues using Azure Monitor, under the guidance of senior team members.
  • Contributed to the documentation of data solutions, ETL processes, and data models to support knowledge sharing and continuity of retail operations.
  • Good Experience in Agile Methodology, Scrum stories and sprint experience in Python based environment along with Excel data extracts.
  • Designed and developed weekly, monthly reports by using MS Excel Techniques (charts, graphs and pivot tables) and Power Point presentations.
  • Tools and Technologies: PySpark, Apache Spark, Scala, Azure Data Factory, Hive, Git, Azure DevOps, Azure Monitor.

Internship

SmartBridge/IBM
05.2020 - 08.2020
  • Participated in the development of a web-based application to predict university admission chances for students based on their academic scores.
  • Collected and analyzed data from multiple sources to identify key features impacting university admissions.
  • Cleaned and integrated data to form a comprehensive dataset, ensuring high data quality through preprocessing steps such as handling missing values and feature scaling.
  • Developed and trained multiple machine learning models using supervised learning algorithms (e.g., logistic regression, decision trees, random forests) to predict admission outcomes.
  • Evaluated models based on performance metrics like accuracy and AUC scores, addressing class imbalance issues to improve model reliability.
  • Utilized Python, scikit-learn, NumPy, Pandas, Matplotlib, and Flask for developing and deploying the application.
  • Worked with Anaconda for package management and Spyder for coding and debugging. Also used Jupyter Notebook for data visualization and model testing.
  • Assisted in developing the user interface to allow students to input their scores and receive admission predictions.
  • Documented the project workflow, including data preprocessing steps, model training, and evaluation procedures.
  • Tools and Technologies: Git, Python (Pandas, NumPy, TensorFlow, Matplotlib), Anaconda, Spyder, Flask.

Education

MASTERS IN INFORMATION STUDIES -

Trine University
UNITED STATAES
12-2023

BACHELOR IN COMPUTER SCIENCE & Engg. - Computer Science

REVA University
INDIA
05-2020

Skills

  • ETL development
  • Data modeling
  • Data pipeline design
  • Data migration
  • SQL programming
  • SQL and databases
  • Python
  • AWS
  • Hadoop
  • Kafka
  • Tableau/Power BI
  • Git
  • GitHub
  • Data warehousing

Certification

CLOUD COMPUTING

Timeline

Data Engineer

UBS
12.2023 - Current

Data Engineer

Citrix
12.2022 - 06.2023

Jr.Data Engineer

OptumGlobalSolution
09.2020 - 12.2021

Internship

SmartBridge/IBM
05.2020 - 08.2020

BACHELOR IN COMPUTER SCIENCE & Engg. - Computer Science

REVA University

MASTERS IN INFORMATION STUDIES -

Trine University

Personal Information

Title: Data Engineer