Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

SUNNY MALIK

Sunnyvale,USA

Summary

Accomplished Platform and Data Engineering Leader with 12+ years of experience architecting scalable cloud-native and hybrid data platforms across Fortune 500 companies, including Apple, Samsung, MasterCard, and Kohl’s. Proven expertise in designing low-latency, high-throughput pipelines, unified data lakehouses, and secure, privacy-compliant AI platforms.

Recognized for cross-org leadership, review technical specifications and service designs at inception. Adept in cloud (AWS, GCP), Kubernetes, Spark, Kafka, Delta Lake, Iceberg, and large-scale data governance. Known for bridging engineering execution with visionary system design.

Overview

16
16
years of professional experience
1
1
Certification

Work History

Staff Platform Engineer

Apple Inc.
Cupertino, USA
08.2021 - Current
  • Designed Apple’s Hybrid Cloud File System enabling seamless data movement between AWS, Google Cloud, and Apple’s on-premise GPU data centers—supporting high-performance AI model training with strict security and compliance.
  • Designed Storage-Agnostic Access Layer, allowing producers and consumers to interact with data through Apple-defined endpoints, masking the underlying storage provider—facilitating dynamic backend switching based on cost, speed, and locality.
  • Created Apple’s REST-Based Catalog to Replace Hive Metastore, addressing security, performance, and scalability limitations—while ensuring seamless migration with zero downtime.

Staff Data Engineer

Samsung SmartThings
04.2019 - 08.2021
  • Designed and Implemented "ControlMesh," a Kubernetes-Native, Self-Healing Infrastructure Framework by integrating Crossplane and FluxCD—replacing traditional static IaC methods (e.g., Terraform) with a continuously reconciled, declarative model. This pioneering solution eliminated configuration drift, automated correction of manual changes, and improved system uptime, auditability, and operational efficiency across Samsung SmartThings' large-scale cloud infrastructure.
  • Built Unified Data Lake Architecture integrating real-time and batch datasets, enabling Samsung’s Data Science and Analytics teams to train advanced ML models on smart device telemetry and user behavior.
  • Designed & Deployed Real-Time Ingestion Pipelines that reliably ingest up to 4 TB of IoT data per hour with near-zero data loss, ensuring SmartThings can scale to billions of daily device events across the globe.
  • Designed and build “OffsetIQ: Spark-Kafka Connector” — a high-reliability Spark Streaming component for Kafka that:
    Avoids native Spark-Kafka offset issues.
    Prevents data loss and consumer lag.

Staff Data Engineer

MasterCard
10.2017 - 03.2019
  • Architected and Led Design of Mastercard’s On-Prem Real-Time Big Data Platform, enabling mission-critical data processing within secure, compliant boundaries.
  • Built Unified Real-Time Transaction Visibility System — integrated various transaction types (credit, debit, fraud alerts, etc.) into a single searchable real-time view using Kafka, Apache Flume, Spark Streaming, HBase, and Solr.
  • Built a high-speed, low-latency data ingestion and search pipeline using the open-source Spark-Kafka connector on Cloudera Hadoop (CDH), integrating Apache Solr for real-time indexing and instant searchability of large-scale financial transactions—empowering analysts and fraud teams with immediate data access.
  • Collaborated Cross-Functionally with enterprise teams to ensure platform stability, business logic alignment, and timely delivery under strict SLAs.

Lead Engineer

JCPenney
08.2017 - 10.2017
  • Led Big Data initiatives across AWS and GCP, designing a scalable framework for the data science team to run R-based workloads on Google Cloud, implementing a real-time clickstream ingestion pipeline using AWS Kinesis, building interactive Kibana dashboards, and creating a data lake with Datameer to support analytics and personalization use cases.

Sr. Architect - Big Data

Kohls Corporation
09.2015 - 08.2017
  • Design and Implemented a Unified Data Management System (DMS) that enabled multi-modal AI model training by seamlessly combining structured (e.g., logs, tables) and unstructured (e.g., images, text, video) data into immutable, versioned datasets. This innovative architecture significantly enhanced model reproducibility, explainability, and performance, laying the foundation for generative AI use cases across recommendation, summarization, and vision-language tasks—years ahead of industry adoption trends.
  • Engineered a scalable on-premise Data Lake and real-time inventory system using Hadoop, Kafka, Spark, Hive, and Pig to support AI-driven analytics. Partnered with data scientists to deploy core business algorithms, enabling real-time decision-making across marketing, inventory, and customer engagement platforms.

Lead Analyst

Happiest minds technologies
11.2013 - 08.2015
  • Led development of a data analytics platform for retail clients, integrating Hadoop, Spark, and Hive to process large-scale transactional data. Designed ETL pipelines and reporting systems that enabled real-time insights into customer behavior, inventory trends, and sales performance.

Sr Developer | Project Lead

ITG Inc
Culver City, USA, Bangalore, India
01.2009 - 12.2013
  • Led Compliance Log Analytics Project: Designed and developed a log analysis system using Hive and Hadoop to detect non-compliant activity across production servers via Clickstream data.
  • Introduced Hadoop as an ETL Framework: Re-architected legacy Java-based ETL workflows using Hadoop to improve performance and memory efficiency; deployed across a 14-node cluster.
  • Optimized ETL with MapReduce: Utilized advanced Hadoop features such as DistributedCache and SequenceFiles for high-throughput and resource-optimized data processing.
  • Built SQL Query Automation Tool: Created a user-friendly, UI-based SQL generator using ExtJS and Spring to automate data mapping for pre/post analytics validation.
  • End-to-End Ownership: Took full responsibility for research, design, deployment, and maintenance of the Hadoop ecosystem, ensuring seamless integration with existing Unix-based infrastructure.
  • Mentored Junior Developers: Acted as a project lead, guiding a team of 3 developers through delivery of scalable data processing solutions.

Education

Master of Science - Computer Science

University of Southern California
USA
12.2008

Bachelor of Technology - Electronics and Comm.

Uttar Pradesh Technical University
Noida, India
05.2006

Skills

  • Scala, Java, Python, Go
  • Lakehouse, Delta lake, Data Lake expert
  • Cloud architecture and management
  • Container orchestration
  • Data pipeline design
  • Infrastructure as code
  • Mentorship and leadership
  • AWS and GCP expertise
  • Data processing frameworks
  • Database management systems
  • Workflow automation tools
  • Big data technologies
  • Version control systems
  • Project management tools
  • Web development frameworks
  • Operating systems

Certification

  • Data Engineering on Google Cloud Platform Specialization, 10/01/17, O'reilly
  • Certified Spark Developer (Databricks), 01/01/16
  • Cloudera Certified Hadoop Developer (CCDH), 03/01/15
  • Certified Scala Developer (Coursera), 06/01/14
  • Sun Certified Java Programmer (SCJP 1.5), 05/01/10

Timeline

Staff Platform Engineer

Apple Inc.
08.2021 - Current

Staff Data Engineer

Samsung SmartThings
04.2019 - 08.2021

Staff Data Engineer

MasterCard
10.2017 - 03.2019

Lead Engineer

JCPenney
08.2017 - 10.2017

Sr. Architect - Big Data

Kohls Corporation
09.2015 - 08.2017

Lead Analyst

Happiest minds technologies
11.2013 - 08.2015

Sr Developer | Project Lead

ITG Inc
01.2009 - 12.2013

Master of Science - Computer Science

University of Southern California

Bachelor of Technology - Electronics and Comm.

Uttar Pradesh Technical University
SUNNY MALIK