Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Kishore Kotaru

Arlington,USA

Summary

Data Engineer with 7 years of experience in designing and optimizing scalable data pipelines and ETL workflows for real-time and batch processing. Proficient in big data technologies such as Spark and Hadoop, and cloud platforms including AWS and Azure, enabling efficient processing of large datasets. Expertise in Python and SQL for developing reliable data processing scripts, alongside strong skills in data modeling and query optimization for both relational and NoSQL databases. Proven ability to implement CI/CD pipelines and monitoring solutions to enhance data workflow efficiency and ensure data governance compliance.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Data Engineer

7-Eleven
Arlington, Texas
02.2023 - Current
  • Designed and deployed real-time inventory data pipeline using Kafka and Spark Streaming, processing over 5 million daily POS transactions.
  • Built AWS Glue ETL workflows to transform raw data into optimized Parquet/Delta Lake formats, reducing storage costs by 35%.
  • Developed PySpark scripts for data cleansing and validation, enhancing accuracy by 25% through automated anomaly detection.
  • Implemented Change Data Capture with Debezium and Kafka to sync inventory updates across over 10,000 stores in near real-time.
  • Optimized Redshift clusters via partitioning and distribution keys, cutting query times by 50% for business reporting.
  • Created Grafana dashboards to monitor pipeline health, tracking latency, throughput, and error rates.
  • Automated data lineage tracking using OpenLineage to ensure compliance with audit requirements.
  • Monitored data systems performance, identifying bottlenecks and implementing solutions to maintain system efficiency.
  • Automated data quality checks and error handling processes to ensure the integrity and reliability of datasets.
  • Managed version control and deployment of data applications using Git, Docker, and Jenkins.
  • Migrated legacy batch jobs to serverless AWS Lambda, reducing runtime by 60%.
  • Worked as part of project teams to coordinate database development and determine project scopes and limitations.

Data Engineer

ABC Fitness
09.2017 - 03.2023
  • Developed and implemented data models, database designs, data access and table maintenance codes.
  • Developed Python scripts for extracting data from web services API's and loading into databases.
  • Optimized existing queries to improve query performance by creating indexes on tables.
  • Analyzed user requirements, designed and developed ETL processes to load enterprise data into the Data Warehouse.
  • Created stored procedures for automating periodic tasks in SQL Server.
  • Researched and integrated new data technologies and tools to keep the data architecture modern and efficient.
  • Provided technical mentorship to junior data engineers, guiding them on best practices and project execution.
  • Established and enforced data governance policies and procedures to comply with regulatory requirements and ensure data privacy.
  • Participated in agile development processes, contributing to sprint planning, stand-ups, and reviews to ensure timely delivery of data projects.
  • Collaborated with cross-functional teams to gather requirements and translate business needs into technical specifications for data solutions.
  • Conducted rigorous testing and validation of data pipelines to ensure accuracy and completeness of data.
  • Collaborated with data scientists and analysts to understand data needs and implement appropriate data models and structures.
  • Designed data warehousing solutions, applying dimensional modeling techniques for optimized data retrieval.
  • Implemented data visualization tools like Tableau and Power BI to create dashboards and reports for business stakeholders.
  • Configured and maintained cloud-based data infrastructure on platforms like AWS, Azure, and Google Cloud to enhance data storage and computation capabilities.
  • Implemented and optimized big data storage solutions, including Hadoop and NoSQL databases, to improve data accessibility and efficiency.
  • Optimized Spark jobs for improved performance, scalability, and reliability.
  • Created Hive tables, optimized queries, stored procedures, functions and views on Hadoop clusters.
  • Developed and implemented Spark applications using Python and Scala.

Education

Master of Science - Data Science

University of The Cumberland's
Williamsburg, KY
05-2024

Skills

  • Python (Pandas, PySpark)
  • SQL (Query Optimization, Window Functions)
  • Scala (Spark)
  • Bash Scripting
  • Spark, Kafka, Airflow
  • Hadoop (HDFS, Hive)
  • AWS (S3, Redshift, Lambda, Kinesis, EMR, Glue)
  • Azure (Data Factory, Synapse, Databricks,)
  • GCP (BigQuery, Dataflow)
  • Snowflake / Redshift / BigQuery
  • PostgreSQL / MySQL / Oracle
  • Cassandra / MongoDB / DynamoDB
  • GDPR/CCPA Compliance
  • CI/CD (Jenkins, GitLab)
  • Terraform / CloudFormation
  • Docker / Kubernetes
  • Git / Bitbucket
  • Prometheus / Grafana
  • Big data processing
  • ETL development
  • Data modeling
  • Data governance
  • Data analysis
  • Agile methodologies
  • Data visualization

Certification

  • AWS Solution Architect Associate

Timeline

Data Engineer

7-Eleven
02.2023 - Current

Data Engineer

ABC Fitness
09.2017 - 03.2023

Master of Science - Data Science

University of The Cumberland's
Kishore Kotaru