Over 13 years of experience as a Dynamic Principal Data Engineer, delivering cutting-edge data solutions across diverse industries to address complex business challenges.
Passionate about converting raw data into actionable insights and compelling narratives, enhancing decision-making processes and driving organizational success.
Skilled at end-to-end data management, from extraction and transformation to producing final insights that shape strategic business decisions.
Expertise in designing and optimizing data pipelines and storage systems to ensure seamless data accessibility, scalability, and reliability.
Proficient in advanced analytics and visualization techniques, enabling businesses to gain deeper insights and leverage data effectively for competitive advantage.
Proven track record in leading cross-functional teams to execute large-scale, high-impact projects while ensuring adherence to stringent data governance and quality standards.
Experienced in using state-of-the-art technologies such as Apache Spark, AWS services, and data integration tools to streamline processes and introduce innovation in data workflows.
Known for driving strategic decision-making and continuous improvement, fostering sustainable organizational growth and enhancing operational efficiency.
Demonstrates a strong commitment to maintaining data integrity and security, ensuring compliance with organizational policies and industry regulations.
Adept at fostering a culture of collaboration and innovation, inspiring teams to develop creative, data-driven solutions that add measurable business value.
Overview
14
14
years of professional experience
2
2
Certificates
1
1
Language
Work History
Senior Data Engineer
Mobilityware
09.2023 - Current
Spearheaded the design and implementation of a modern Data Lakehouse using Apache Iceberg, enabling seamless data consolidation for strategic business decision-making.
Successfully migrated Snowflake ETL pipelines to an open data lake in Iceberg, reducing costs by ~30% and eliminating vendor lock-in.
Designed, developed, and optimized high-performance batch and real-time data pipelines using Apache Spark, Apache Flink, PySpark, Java, and Scala.
Built and maintained real-time data streaming applications using Apache Kafka and Apache Flink, ensuring low-latency, high-throughput processing.
Developed and optimized ETL workflows for structured and semi-structured data using AWS Glue, Redshift, and SQL-based transformations.
Enforced data governance frameworks, ensuring data integrity, accessibility, and compliance, achieving 99.9% accuracy in reporting.
Designed and implemented MWES meta-audit processes, introducing alerting mechanisms to detect and resolve pipeline issues proactively.
Leveraged AWS services like EMR, Redshift, Lambda, Glue, and S3 to build and manage scalable big data solutions.
Implemented Terraform-based infrastructure automation, ensuring repeatable, scalable, and cost-effective deployments in cloud environments.
Tuned Spark, Flink, and SQL queries to improve performance, reduce processing times, and optimize resource utilization.
Developed and optimized Apache Airflow DAGs, integrating with Kubernetes for containerized deployments, enhancing workflow scalability.
Provided technical leadership and mentorship to junior engineers, ensuring best practices in big data engineering, cloud computing, and performance optimization.
Data Platform Engineer
MasscomCorp
12.2022 - 08.2023
Integrated flexible data pipelines by building systems with dynamic schema handling capabilities, reducing manual intervention and ensuring immediate availability of new attributes for analytical use.
Led cost optimization initiatives by migrating data processing to cost-efficient in-house solutions, achieving significant operational cost reductions of 50%.
Developed real-time and batch processing applications using Spark, Scala, Kafka, and Cassandra to enhance data availability and decision-making speed.
Leveraged cloud-native tools such as AWS Glue, Redshift, EMR, Lambda, and S3 to create scalable and efficient data workflows.
Designed and implemented automated meta-audit processes for end-to-end pipeline monitoring, ensuring data quality and leakage prevention with real-time alerts.
Built scalable data systems by designing high-volume, multi-threaded data processing architectures, enabling seamless handling of growing datasets while improving operational efficiency.
Created audience segmentation tools that empowered marketing teams to make targeted decisions, increasing ROI on ad campaigns.
Optimized ETL processes by designing and implementing robust pipelines using Redshift, Spark, and Python, ensuring efficient data transformation and integration across various sources.
Developed and maintained scalable DAGs in Apache Airflow for automated data workflows and ensured fault-tolerant execution in cloud environments.
Engineered real-time and batch data pipelines using Apache Spark, Apache Flink, PySpark, Java, and Scala, enhancing data processing capabilities.
Built low-latency streaming applications with Kafka and Flink, ensuring seamless real-time data ingestion and analytics.
Collaborated with Product Owners, Analytics Directors, and stakeholders to define and execute long-term data roadmaps and strategies, supporting business growth and innovation.
Tech Lead
Legato Health technologies
01.2020 - 11.2022
Cloud-Based Data Pipelines: Guided the creation of scalable and efficient cloud-based data pipelines using Spark and Scala for high-performance data processing.
AWS Data Storage Optimization: Enhanced data storage and transformation capabilities on AWS S3 with Spark and EMR, ensuring optimized retrieval and processing speeds.
Snowflake Expertise: Revolutionized data storage and retrieval using AWS Snowflake, leveraging advanced SQLqueries to enhance operational efficiency and data access.
Cloud Migration Leadership: Spearheaded a migration from Teradata and Oracle to Snowflake, resulting in $2 million in annual savings and reduced data latency.
Cost-Effective Cloud Redesign: Led the redesign of on-premises applications to AWS, achieving $2.68 million in annual savings and improving scalability and reliability.
Knowledge Sharing Platform: Developed a knowledge-sharing platform to enhance collaboration and foster continuous learning within the tech team.
Containerization: Introduced Docker-based containerization to improve application scalability, deployment speed, and overall system reliability.
System Security Enhancement: Led cross-functional teams in enhancing system security, reducing vulnerabilities by 40% through proactive risk assessments and mitigations.
Big Data & Analytics Expertise: Utilized Hadoop and Elastic Search to process and analyze large datasets, driving actionable business insights.
End-to-End ETL Development: Designed and implemented robust ETL pipelines using modern tools, ensuring seamless integration across diverse systems and data sources.
Data Engineer
Tiger Analytics
08.2018 - 03.2020
Built robust data pipelines using Spark and Scala to streamline data ingestion from multiple sources, enabling seamless integration and processing.
Utilized Hadoop Distributed File System (HDFS), Sqoop, Hive, and Pig to manage and analyze large datasets, deriving actionable insights from complex data.
Developed real-time and batch processing applications using Spark/Scala, Kafka, and Cassandra, ensuring efficient data enrichment and transformation.
Leveraged AWS services, including EMR, Lambda, Redshift, Glue, EC2, and S3, to build scalable ETL pipelines and optimize computational operations.
Administered clusters within the Hadoop ecosystem and contributed to the successful migration from Hortonworks Data Platform (HDP) to Cloudera Data Platform (CDP).
Designed and implemented an ETL pipeline to extract Parquet files from S3 and persist them in HDFS, enhancing data accessibility and analysis capabilities.
Utilized Jenkins and GitHub to automate CI/CD workflows, streamlining deployment processes for data engineering applications.
Collaborated with data scientists to develop tools and frameworks that support advanced analytics and machine learning workflows.
Designed reporting applications using Spark SQL and Hive, enabling efficient data querying and insights generation for business decision-making.
Ensured data separation and segregation according to policy requirements, maintaining compliance and enhancing data security across all processes.
Hadoop Developer
Accenture
07.2015 - 05.2018
Spearheaded the creation of a robust data pipeline using Sqoop, Pig, and Spark, significantly enhancing the ingestion and processing of large datasets.
Crafted and executed Spark scripts to perform complex data analysis, seamlessly processing and storing results in HDFS for downstream applications.
Designed and implemented Hive tables and HQL queries, streamlining data storage and enabling efficient report generation for business intelligence.
Managed and optimized intricate Hive queries to enhance data retrieval performance and streamline storage solutions.
Led data transformation initiatives using Pig and Spark, producing cleaner and pre-aggregated datasets to facilitate accurate reporting and analysis.
Improved storage efficiency by designing effective schema structures in Hive, ensuring scalability and data integrity.
Implemented Spark-based workflows to handle large-scale data processing tasks, significantly reducing processing time and improving pipeline performance.
Enabled in-depth business insights by integrating advanced data reporting capabilities through the efficient use of HQL and Hive tables.
Mainframe Developer
TESCO HSC
11.2014 - 06.2015
Cross-Functional Team Collaboration: Partnered with diverse teams to successfully deliver high-quality mainframe software projects within deadlines, ensuring seamless coordination and execution
Coding Standards Advocacy: Promoted and implemented best practices in coding standards and documentation, enhancing the consistency and quality of team deliverables
Industry Trends Expertise: Stayed abreast of emerging industry trends, integrating cutting-edge methodologies to improve programming practices and project outcomes
Process Optimization: Streamlined workflows and optimized performance by conducting detailed code reviews and addressing critical debugging requirements
Quality Assurance Leadership: Championed a culture of excellence by establishing rigorous coding standards and continuous improvement processes for superior project quality
Mainframe Developer
UST Global
08.2011 - 10.2014
Team Leadership and Mentorship: Provided expert guidance and mentorship to team members, fostering skill development and ensuring smooth project execution
Issue Resolution: Monitored job execution closely and proactively addressed and resolved technical issues, maintaining project momentum
Implementation Support: Conducted post-implementation reviews and provided continuous support during implementation phases, ensuring seamless transitions and stability
Comprehensive Testing: Developed detailed test plans, executed rigorous testing processes, and documented precise test results to ensure system reliability and quality
Continuous Improvement: Leveraged feedback from post-implementation reviews to enhance processes and deliverables, driving consistent project success
Education
Bachelors Degree -
Jawaharlal Nehru University
05.2005 - 5 2009
Skills
Apache Spark
undefined
Certification
AWS Certified Solutions Architect - Associate
Work Availability
monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse
Languages
English
Accomplishments
Resolved product issue through consumer testing.
Successfully transitioned Treasure Data systems to an in-house solution, reducing operational costs by 50% while ensuring seamless data integration.
Migrated the Snowflake ETL to an open data lake in Iceberg, reducing costs by approximately 30% and eliminating data warehouse lock-in contracts.
Led the redesign of on-premises applications to AWS, achieving $2.68 million in annual savings and improving system scalability and reliability.
Work Preference
Work Type
Contract WorkPart Time
Work Location
RemoteHybrid
Important To Me
Work from home optionFlexible work hoursCareer advancementPersonal development programs