Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Timeline
Generic

PARTHA MEHTA

Randolph Township,NJ

Summary

  • Highly skilled and motivated Data Engineer with a robust track record in building, scaling, and optimizing large-scale data pipelines and distributed systems at Cisco Talos, contributing to the Threat Analytics Platform (TAP) Core Development team.
  • Proficient in PySpark, Databricks, Go, and Python, with extensive experience in cloud technologies including AWS, Terraform, and Azure, as well as expertise in modern data lake and Delta Lake architectures.
  • Demonstrated success in designing and deploying comprehensive end-to-end data workflows that encompass prevalence aggregation, retention policies, SCD2 modeling, and event-driven ingestion pipelines. Adept at developing resilient ETL/ELT pipelines while automating monitoring processes to enhance reliability for multi-terabyte datasets.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Cloud Engineer (Talos: TAP Core Dev Team)

Cisco Systems, Inc.
10.2023 - Current
  • Built and maintained multi-terabyte ETL pipelines for threat datasets.
  • Designed and orchestrated first and second-level prevalence aggregation pipelines in Databricks.
  • Developed retention frameworks and job workflows in Delta Lake, reducing storage costs and improving performance.
  • Validation initiatives with SQL notebooks comparing DEV vs PROD across observables.
  • Created a Go-based CLI tool to re-drive or re-run failed Step Function executions. This came in real use during a big customer outage incident where ingest data was missing.
  • Engineered and deployed the health-check system in Go.
  • Implemented a storage-efficient SCD2 pipeline for a dataset, reducing >70% data redundancy.
  • Built Managed Delta tables for near-real-time API ingestion and long-term historical tracking, integrated with Databricks, S3, and ClickHouse.
  • Contributed to AI/LLM initiatives, including an internal TAP chatbot.
  • Provided on-call support for pipeline incidents, including triage, repair, RCA, and documentation.
  • Deployed Spark jobs, Step Functions, Lambdas, and IAM policies via Terraform across multi-region AWS.
  • Built a long-running Databricks job detection system for anomaly alerting.
  • Contributed to TAP-wide Go packages. Focused on building reusable libraries for AWS service integration.
  • Successfully updated Golang base-ci image, alpine base ci image and Golang / Terraform version upgrades to latest for SOC2 audit.
  • Updated code repo for TAP team for compatibility and consistency with latest Terraform or Go versions.
  • Improved system reliability by replacing static cron schedules with dependency-driven triggers in Databricks, preventing rollups on incomplete datasets.

Data Scientist - Full Time CPT

Syntactech
01.2023 - 05.2023
  • Worked on marketing analytics initiatives using statistical modeling, ML techniques, and Time Series forecasting.
  • Delivered actionable insights via automated dashboards, through KPI tracking, A/B testing, and campaign analysis to support data-driven decisions.
  • Built forecasting models that enabled accurate sales planning and strategic goal setting.
  • Analyzed customer behavior to assess product impact, reduce churn, and enhance engagement strategies.
  • Developed strategies to optimize client channel placement and improve commercial account performance.
  • Built scalable revenue prediction models, helping drive long-term business planning.

Data Engineer - Part Time On Campus Role

GEP Worldwide
09.2022 - 12.2022
  • Automated data ingestion and parsing guide generation using Azure Data Factory and Databricks, enabling same-day client onboarding (down from 4 days).
  • Built end-to-end monitoring and error logging system with Azure Log Analytics and Power BI for real-time visibility.
  • Improved data processing efficiency through optimized ingestion logic using Databricks and Apache Kafka.
  • Revamped ETL with an automated framework, increasing data accuracy, and reducing processing time.
  • Delivered data cube reports via Azure Data Factory and triggered Spark jobs within ADF pipelines for scalable processing.

Technology Consultant/Data Engineer

PricewaterhouseCoopers SDC
08.2019 - 08.2021
  • Worked on end-to-end development of real-time and batch data pipelines using Snowflake, Spark, Azure, and AWS, supporting user analytics, content recommendations, and enterprise reporting at scale.
  • Automated data ingestion, ETL frameworks, and monitoring systems across Azure Data Factory, Logic Apps, and AWS services, reducing processing time, improving data accuracy, and cutting operational costs.
  • Migrated critical pipelines from third-party tools (e.g., Informatica to native AWS) and built reusable frameworks for Salesforce, HR, and media data feeds, enabling secure, scalable, and cost-efficient ingestion.
  • Designed internal tools for query optimization, job monitoring, and real-time analytics using Hive, Elasticsearch, Cassandra, Django, and QuickSight, improving performance, data governance, and developer productivity.

Education

Master of Science - Data Science Computational Track

New Jersey Institute of Technology
Newark, NJ
05-2023

Bachelor of Science - Computer Science

Nitte Meenakshi Institute of Technology, VTU
Bangalore, India
05-2019

Skills

  • Programming Languages: Golang, Python, Scala, Shell Scripting, Java, gRPC, Terraform
  • Tools & Frameworks: Docker, Kubernetes, Grafana, Airflow, Streamlit, Microservices, Linux, Kibana, Django, Flask, Prometheus
  • Data Engineering (Processing & Storage): Apache Spark, PySpark, Scala Spark, Databricks, Delta Lake, Unity Catalog, Observability
  • Data Engineering with Cloud Services: AWS (Lambda, S3, DynamoDB, Athena, Glue, EKS, ECS, ELB, Redshift, SNS, SES, SQS, Cloudwatch, Eventbridge, Cloud Trail, EC2, Security Group, EMR, Auto Scaling, RDS, CFT, Step Functions, Kinesis), Databricks
  • Software Development: Agile Methodologies, CI/CD (Github Actions, Gitlab), Version Control (GIT, SVN), Jira, Confluence
  • Security: IAM, SOC2 documentations, Incident Response and Mitigation, Vault
  • Databases: NoSQL (DynamoDB, Redshift, MongoDB), MySQL
  • Data Science: Algorithms with ML, NLP, GenAI (LLMs, RAG Pipelines, Agents, LlamaIndex, OpenAI, GPT, Langchain models)

Accomplishments

  • Received Connected Recognitions from team members including manager at Cisco.
  • Received Connected Recognition from Director of Security Researching for developing our own internal AI agent at Cisco.
  • Identified and fixed PROD issues wrt data availability from our event-driven pipelines on AWS, especially during my on-call rotations at Cisco.
  • Identified and fixed PROD bugs in CI/CD within first 2 months at Cisco. Received recognition for this effort.
  • Received 2 promotions within 2 years of joining as a Fresher at PWC SDC, US Advisory, Bangalore, India.
  • Received real-time recognition and On-Spot Awards from Offsite and Onsite teams at PWC.

Certification

  • HashiCorp Certified: Terraform Associate 2025 on Udemy
  • The Ultimate MySQL Bootcamp: Go from SQL Beginner to Expert on Udemy
  • AWS Certified Data Engineer Associate 2025 - Hands On! on Udemy
  • Databricks Certified Data Engineer Associate - Preparation on Udemy
  • Docker for the Absolute Beginner - Hands On - DevOps on Udemy
  • Docker Certified Associate 2023 on Udemy
  • Kubernetes Certified Application Developer (CKAD) with Tests on Udemy
  • Participated in Talos AI Hackathon (2025) at Cisco

Timeline

Cloud Engineer (Talos: TAP Core Dev Team)

Cisco Systems, Inc.
10.2023 - Current

Data Scientist - Full Time CPT

Syntactech
01.2023 - 05.2023

Data Engineer - Part Time On Campus Role

GEP Worldwide
09.2022 - 12.2022

Technology Consultant/Data Engineer

PricewaterhouseCoopers SDC
08.2019 - 08.2021

Master of Science - Data Science Computational Track

New Jersey Institute of Technology

Bachelor of Science - Computer Science

Nitte Meenakshi Institute of Technology, VTU
PARTHA MEHTA