Highly skilled and motivated Data Engineer with a robust track record in building, scaling, and optimizing large-scale data pipelines and distributed systems at Cisco Talos, contributing to the Threat Analytics Platform (TAP) Core Development team.
Proficient in PySpark, Databricks, Go, and Python, with extensive experience in cloud technologies including AWS, Terraform, and Azure, as well as expertise in modern data lake and Delta Lake architectures.
Demonstrated success in designing and deploying comprehensive end-to-end data workflows that encompass prevalence aggregation, retention policies, SCD2 modeling, and event-driven ingestion pipelines. Adept at developing resilient ETL/ELT pipelines while automating monitoring processes to enhance reliability for multi-terabyte datasets.
Overview
6
6
years of professional experience
1
1
Certification
Work History
Cloud Engineer (Talos: TAP Core Dev Team)
Cisco Systems, Inc.
10.2023 - Current
Built and maintained multi-terabyte ETL pipelines for threat datasets.
Designed and orchestrated first and second-level prevalence aggregation pipelines in Databricks.
Developed retention frameworks and job workflows in Delta Lake, reducing storage costs and improving performance.
Validation initiatives with SQL notebooks comparing DEV vs PROD across observables.
Created a Go-based CLI tool to re-drive or re-run failed Step Function executions. This came in real use during a big customer outage incident where ingest data was missing.
Engineered and deployed the health-check system in Go.
Implemented a storage-efficient SCD2 pipeline for a dataset, reducing >70% data redundancy.
Built Managed Delta tables for near-real-time API ingestion and long-term historical tracking, integrated with Databricks, S3, and ClickHouse.
Contributed to AI/LLM initiatives, including an internal TAP chatbot.
Provided on-call support for pipeline incidents, including triage, repair, RCA, and documentation.
Deployed Spark jobs, Step Functions, Lambdas, and IAM policies via Terraform across multi-region AWS.
Built a long-running Databricks job detection system for anomaly alerting.
Contributed to TAP-wide Go packages. Focused on building reusable libraries for AWS service integration.
Successfully updated Golang base-ci image, alpine base ci image and Golang / Terraform version upgrades to latest for SOC2 audit.
Updated code repo for TAP team for compatibility and consistency with latest Terraform or Go versions.
Improved system reliability by replacing static cron schedules with dependency-driven triggers in Databricks, preventing rollups on incomplete datasets.
Data Scientist - Full Time CPT
Syntactech
01.2023 - 05.2023
Worked on marketing analytics initiatives using statistical modeling, ML techniques, and Time Series forecasting.
Delivered actionable insights via automated dashboards, through KPI tracking, A/B testing, and campaign analysis to support data-driven decisions.
Built forecasting models that enabled accurate sales planning and strategic goal setting.
Analyzed customer behavior to assess product impact, reduce churn, and enhance engagement strategies.
Developed strategies to optimize client channel placement and improve commercial account performance.
Built scalable revenue prediction models, helping drive long-term business planning.
Data Engineer - Part Time On Campus Role
GEP Worldwide
09.2022 - 12.2022
Automated data ingestion and parsing guide generation using Azure Data Factory and Databricks, enabling same-day client onboarding (down from 4 days).
Built end-to-end monitoring and error logging system with Azure Log Analytics and Power BI for real-time visibility.
Improved data processing efficiency through optimized ingestion logic using Databricks and Apache Kafka.
Revamped ETL with an automated framework, increasing data accuracy, and reducing processing time.
Delivered data cube reports via Azure Data Factory and triggered Spark jobs within ADF pipelines for scalable processing.
Technology Consultant/Data Engineer
PricewaterhouseCoopers SDC
08.2019 - 08.2021
Worked on end-to-end development of real-time and batch data pipelines using Snowflake, Spark, Azure, and AWS, supporting user analytics, content recommendations, and enterprise reporting at scale.
Automated data ingestion, ETL frameworks, and monitoring systems across Azure Data Factory, Logic Apps, and AWS services, reducing processing time, improving data accuracy, and cutting operational costs.
Migrated critical pipelines from third-party tools (e.g., Informatica to native AWS) and built reusable frameworks for Salesforce, HR, and media data feeds, enabling secure, scalable, and cost-efficient ingestion.
Designed internal tools for query optimization, job monitoring, and real-time analytics using Hive, Elasticsearch, Cassandra, Django, and QuickSight, improving performance, data governance, and developer productivity.
Education
Master of Science - Data Science Computational Track
Software Engineer Leader at Cisco System India Pvt. Ltd and Cisco Systems Poland Sp. z o.oSoftware Engineer Leader at Cisco System India Pvt. Ltd and Cisco Systems Poland Sp. z o.o