Summary
Overview
Work History
Education
Skills
Timeline
Generic

Mayank Solanki

Clifton

Summary

Results-driven Data Engineer with over 6 years of experience architecting and deploying robust, end-to-end data platforms. Proven expertise in building automated CI/CD pipelines (GitHub Actions, Docker), large-scale ETL systems (Spark, Airflow, Delta Lake), and real-time streaming applications (Kafka, Redis). A collaborative innovator who has successfully productionized AI/ML models and built LLM-powered tools (AWS Bedrock), directly leading to significant improvements in efficiency, fraud detection, and enhancing overall profitability.

Overview

7
7
years of professional experience

Work History

Data Engineer

AH Infotech LLC (Sony Music Entertainment)
New York
10.2019 - Current
  • Championed and executed the architectural overhaul of a critical legacy reporting system, replacing fragile Shell scripts with a modern, containerized data platform. Pioneered a full CI/CD pipeline using GitHub Actions to automatically build and deploy version-controlled Docker images to AWS ECR. Orchestrated the new data pipeline with Apache Airflow, which dynamically utilized AWS Batch for Python tasks and AWS EMR Serverless for large-scale PySpark jobs. This new architecture reduced data processing failures by 95% and introduced new capabilities for scalability and reproducibility.
  • Architected and engineered a large-scale, end-to-end ETL pipeline orchestrated by Apache Airflow to process over 2 TB of raw data daily from the YouTube API. Utilized PySpark and Python to perform complex transformations, building a reliable data lakehouse by structuring curated datasets into Delta Lake and Apache Iceberg tables on AWS S3. Subsequently, designed and implemented a dimensional data model in the data warehouse, which optimized for analytics, improving query performance by over 40% and enhancing data accessibility for royalty reporting.
  • Architected and delivered an event-driven, real-time data retrieval and caching platform using Apache Kafka and Redis. This system decoupled numerous internal applications from data sources by processing asynchronous requests for artist and track metadata via Kafka consumers. By implementing a Redis caching layer, the platform reduced redundant API calls by over 70% and slashed average data retrieval latency from seconds to sub-10 milliseconds for cached data, establishing a central source of truth for key company metadata.
  • Partnered with Data Science and ML teams to operationalize the "RADAR" (Real-time Anomaly Detection and Reporting) system, transitioning their Python model from a prototype (.pkl file) into a scalable production pipeline. Engineered an automated MLOps workflow to run batch predictions on AWS Batch, with outputs powering both a real-time dashboard for technical teams and automated monthly PDF reports for senior leadership. By identifying and flagging fraudulent streaming activity, this system directly contributed to saving over $25,000 in potential revenue loss monthly and reduced manual data validation by 12+ hours per week.

Jr. Business Analyst

Academy Express LLC
Hoboken
10.2018 - 10.2019
  • Conducted comprehensive market analysis of competitor pricing and route coverage, leading to strategic price adjustments and the launch of new, high-demand bus stops. This initiative resulted in a 15% increase in passenger bookings and captured new market share in underserved areas.
  • Developed a Python automation tool to manage the complex daily scheduling for a fleet of 125+ buses and 100+ drivers across commuter, charter, and inter-state routes. This tool eliminated a multi-hour manual process, reducing the daily scheduling time to under 10 minutes.

Education

Master of Science - Management Information Systems

Pace University
New York, NY
05-2018

Skills

  • AWS: S3, EMR Serverless, AWS Batch, SageMaker, Bedrock, Lambda, Glue, ECR, SNS
  • DevOps & CI/CD: Docker, GitHub Actions, CI/CD Pipeline Development
  • Infrastructure: Containerization, Serverless Computing
  • Big Data Frameworks: Apache Spark (PySpark), Databricks
  • Workflow Orchestration: Apache Airflow, Dagster
  • Lakehouse Formats: Delta Lake, Apache Iceberg
  • ETL/ELT: Dimensional Modeling, Pipeline Architecture, Data Transformation, Automation
  • Databases: Google BigQuery, MySQL, PostgreSQL, Redis
  • Messaging & Streaming: Apache Kafka
  • ML Engineering: MLOps, Model Productionalization, Batch Prediction
  • AI/LLM Services: AWS Bedrock, LLM Fine-Tuning, RAG (Retrieval-Augmented Generation)
  • Languages: Python, Spark, SQL, Shell Scripting
  • Web Frameworks: Flask, Django
  • Data Visualization: Tableau, Power BI

Timeline

Data Engineer

AH Infotech LLC (Sony Music Entertainment)
10.2019 - Current

Jr. Business Analyst

Academy Express LLC
10.2018 - 10.2019

Master of Science - Management Information Systems

Pace University
Mayank Solanki