Summary
Overview
Work History
Education
Skills
Websites
Certification
Technical Experience
Work Availability
Timeline
Hi, I’m

Vivek Kotha

Data Engineer
Prosper,TX
It's fine to celebrate success but it is more important to heed the lessons of failure.
Bill Gates

Summary

Data Engineer with 7+ years of experience designing scalable cloud-native data platforms using PySpark, Snowflake, and AWS. Strong expertise in dimensional modeling, performance optimization, distributed data processing, and AI-enabled data platforms. Experienced in delivering end-to-end data solutions integrated with modern front-end applications.

Overview

7
years of professional experience
1
Certification

Work History

FINRA
Rockville, MD

Data Engineer
08.2024 - Current

Job overview

  • Developed data-driven front-end applications using React.js and Angular, integrating REST APIs with Snowflake-backed data services to deliver real-time analytics dashboards.
  • Designed a cloud data lake and analytics layer on AWS using S3, Glue Catalog, and Athena to enable governed storage and fast interactive queries. Integrated on-demand Athena querying directly into the UI and scheduled ETL with AWS Glue to keep datasets fresh and reliable.
  • Designed and implemented Snowflake dimensional data models (Star/Snowflake schemas) supporting analytics use cases across regulatory datasets, improving query performance by 35%.
  • Led Snowflake performance tuning initiatives using clustering keys, micro-partition pruning, query profile analysis, and warehouse scaling strategies.
  • Built scalable ELT pipelines using Snowflake, S3, and Glue, optimizing compute costs and reducing pipeline runtime by 40%.
  • Implemented secure data sharing and RBAC in Snowflake, enforcing governance and compliance standards for sensitive financial datasets.
  • Integrated Snowflake Cortex AI capabilities to enable AI-powered insights directly within the data warehouse environment.
  • Implemented data governance frameworks including encryption at rest/in transit, masking policies, secure views, and audit monitoring to ensure regulatory compliance.
  • Delivered end-to-end cloud-native data engineering solutions across AWS and Snowflake, from ingestion and modeling to API exposure and front-end integration.

Vista Applied Solutions Group
Herndon, VA

Data Engineer
06.2023 - 07.2024

Job overview

  • Designed, tested, validated, and implemented predictive models for pricing, operational efficiency, and fraud prevention using statistical analysis and data mining techniques. Demonstrated expertise in transforming raw data into actionable insights to drive profitable growth and support sound business decisions.
  • Proficient in SAS, SQL, Python, and R for comprehensive data analysis and modeling. Collaborated effectively with cross-functional teams to improve data quality and model accuracy, contributing to the development and optimization of mathematical ratemaking models to meet business and product line objectives.
  • Created detailed documentation of analytics projects and developed user-friendly dashboards to visualize cause-and-effect relationships. Successfully present complex analytical findings and recommendations to internal management, enhancing decision-making processes.
  • Committed to professional growth within the field of predictive modeling, maintaining up-to-date knowledge of industry research, developments, and trends. Demonstrated ability to research and apply new programming and data mining techniques, working well under supervision and within team environments to support multiple projects simultaneously.

Amazon Web Services
Herndon, VA

SDE-I (Data Engineer)
09.2022 - 04.2023

Job overview

  • Built ETL Data pipelines for cleaning and preprocessing data using AWS Glue, Kafka, Databricks, PySpark, SQL, and ML and streamlined predictive modeling on product-based data.
  • Designed and implemented a real-time processing data pipeline to process semi-structured data by ingesting 100 million+ raw records from various data sources using PySpark, Scala, SQL, and Pandas in Databricks.
  • Collaborated with cross-functional teams to identify and resolve data quality issues, improving overall data accuracy by 15-40%.
  • Transformed legacy machine learning models, implemented linear regression, decision trees and fine-tuned using hyperopt and spark trials, reducing development time by 80% through parallelization.

University of South Florida
Tampa, FL

Teaching/Student Assistant
05.2021 - 05.2022

Job overview

  • Managed product data integration into multiple AWS storage services, including S3, Redshift, RDS, and DynamoDB, resulting in a 50% reduction in data retrieval time.
  • Performed in-depth SQL query analysis and implemented database normalization techniques, resulting in an improvement in system performance.
  • I worked in textual mining and NLP components such as NLU and NLG for text analytics with NLTK libraries. Implemented a Generative Model, Generative Adversarial Networks, for data sampling for predictive analysis on neural networks.

TCS
Kolkata, India

Data Engineer
11.2018 - 01.2021

Job overview

  • Collaborated with a team of 2 data engineers to develop and implement a PySpark-based data ingestion pipeline, resulting in a 30% increase in processing speed.
  • Created Tableau dashboards to visualize and analyze large datasets, leading to an increase in data-driven decision-making across the organization.
  • Managed product data integration into multiple AWS storage services, including S3, Redshift, RDS, and DynamoDB, resulting in a 50% reduction in data retrieval time.
  • Performed in-depth SQL query analysis and implemented database normalization techniques, resulting in an improvement in system performance. Developed complex SQL transformations, window functions, and CTE-based logic to support scalable data marts and KPI reporting layers.
  • Engineered distributed PySpark data pipelines processing 500M+ records daily using optimized partitioning, broadcast joins, and memory tuning. Implemented Spark performance optimization strategies including caching, skew mitigation, and adaptive query execution.
  • Designed and implemented scalable ELT pipelines using PySpark and SQL to ingest structured and semi-structured data into Snowflake, improving data availability for analytics teams.
  • Built dimensional data models (Star and Snowflake schemas) to support enterprise reporting and advanced analytics use cases. Optimized Snowflake workloads by tuning virtual warehouses, implementing clustering keys, and analyzing query execution plans, reducing processing time by 30%.

Education

University of South Florida
Tampa, FL

Master of Science from Business Analytics and Information Systems
01.2021 - 08.2022

University Overview

GPA: 3.75/4.0

Amrita Vishwa Vidyapeetam
Banglore, India

Bachelor of Technology From Electronics And Communication Engineering
06.2014 - 06.2018

Skills

  • Python
  • Scala
  • SQL
  • Linux/Unix
  • MS SQL
  • SSIS
  • MySQL
  • Snowflake
  • Apache Spark
  • Databricks
  • NoSQL
  • DBFS
  • Parquet
  • Avro
  • ORC
  • JSON
  • Hive
  • HBase
  • Presto
  • Zeppelin
  • Hue
  • Splunk
  • Flume
  • Git
  • GitHub
  • Jira
  • Docker
  • Tableau
  • Apache Airflow
  • Azure
  • Apache Kafka
  • Agile
  • Kinesis
  • SQS
  • S3
  • DynamoDB
  • Lambda
  • Glue
  • EMR
  • Athena
  • Redshift
  • TensorFlow
  • Scikit-Learn
  • Pandas
  • NumPy
  • SciPy
  • Seaborn
  • XGBoost
  • Linear Models
  • PCA
  • GLMs
  • T-SNE
  • NLTK
  • LLMs

Websites

Certification

AWS Certified Cloud Solutions Architect - Associate

Google Cloud Certified Professional Data Engineer

Technical Experience

  • Survival Analysis of Breast Cancer Patients using Data Mining, 2021-07-01, 2021-09-30, Evaluated early detection and recurrence risk using machine learning (Two-class Neural Networks and Two-class Decision Jungle). Estimated the marginal effects of predictors such as age, treatment, cell types, nodes, and diagnosis on survival and likelihood of recurrence.
  • National Park Service (Full Stack Web application), 2021-01-01, 2021-05-31, Developed an MVC-style full-stack application that ingests real-time data from an external API and persists it to a backend data store, using JavaScript, C#, HTML, and CSS. Hosted the solution on Microsoft Azure, enabling reliable access and streamlined operations for end users.
Availability
See my work availability
Not Available
Available
monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Timeline

Data Engineer

FINRA
08.2024 - Current

Data Engineer

Vista Applied Solutions Group
06.2023 - 07.2024

SDE-I (Data Engineer)

Amazon Web Services
09.2022 - 04.2023

Teaching/Student Assistant

University of South Florida
05.2021 - 05.2022

University of South Florida

Master of Science from Business Analytics and Information Systems
01.2021 - 08.2022

Data Engineer

TCS
11.2018 - 01.2021

Amrita Vishwa Vidyapeetam

Bachelor of Technology From Electronics And Communication Engineering
06.2014 - 06.2018