Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Swapnil Vikas Meher

Houston,TX

Summary

Over 10 years of IT experience working with both on-premises and cloud environments, specializing in Big Data technologies and cloud platforms. Skilled in designing and developing scalable data pipelines for migrating and processing data using Databricks and other big data tools. Extensive experience with data extraction, cleaning, processing, and pipeline creation using cloud services including: AWS: S3, Glue, EMR, Lambda, CloudFormation, EC2, Secrets Manager, Athena Azure: Azure Data Factory (ADF), Blob Storage, Azure Databricks Successfully created and migrated AWS applications across multiple AWS accounts. Developed EMR and Glue scripts for batch processing in AWS, and Python utilities for data pipeline development and automation. Engaged in product design and implementation, focusing on data cleaning, standardization, duplicate identification, and merging scenarios. Collaborated closely with clients and developers to prepare test plans and scripts ensuring high-quality software delivery. Demonstrated excellent interpersonal, analytical, and relationship-building skills with a strong process-oriented approach to meet cost, profit, service, and organizational goals. Proven leadership abilities with strong communication skills and experience motivating teams and collaborating with upper management. Hands-on expertise in production environment management, including proactive monitoring, debugging, issue mitigation, and fixes. Strong background in Hadoop ecosystem technologies, primarily with Cloudera Distribution (CDH), including development, testing, and deployment in distributed environments. Delivered comprehensive unit testing plans and documentation to ensure robust software quality.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Principal Data Engineer

Murphy Oil Corporation
Houston, TX
06.2025 - Current
  • Optimized existing data ingestion pipelines and redeveloped workflows to enhance performance and reliability. Built a centralized data lake environment to streamline data access, improve governance, and support advanced analytics at Murphy Oil Corporation
  • Created new data ingestion pipelines using Azure Data Factory and Databricks, following the medallion architecture (bronze, silver, gold layers).
  • Developed scalable data ingestion and transformation scripts using PySpark in Databricks.
  • Engaged with business users to understand data challenges and deliver tailored solutions.
  • Contributed to the Power BI to Microsoft Fabric capacity migration, improving report performance and scalability.
  • Technologies Used: Azure Databricks, ADF, Azure devops.

Lead Data Engineer

Bizmetric India Pvt Ltd
India
06.2023 - 05.2025
  • This project involved successfully migrating and optimizing the data warehouse infrastructure by leveraging Databricks capabilities, resulting in enhanced data accessibility, improved query performance, and strengthened data security.
  • Conducted comprehensive analysis of the existing data warehouse infrastructure, identifying pain points and improvement opportunities.
  • Collaborated with cross-functional teams to design and implement a data migration strategy from on-premises data sources to the Databricks platform.
  • Developed efficient ETL pipelines to extract, transform, and load data from diverse sources into Databricks, ensuring data integrity and accuracy.
  • Optimized data models and schemas within Databricks to improve query performance and reduce storage costs.
  • Created and integrated Databricks workflows with Jenkins to establish CI/CD pipelines for automated deployment.
  • Technologies Used: AWS Databricks, Redshift, Jenkins.

Sr .data Engineer

Bizmetric India Pvt Ltd
12.2022 - 06.2023
  • This project involves creating a datalake on azure Databricks.
  • Design and develop project architecture.
  • Created data ingestion pipelines in Azure Data Factory (ADF) to pull data from external sources such as SQL Server.
  • Developed PySpark scripts in Databricks to move and transform data across different data layers following the medallion architecture.
  • Conducted peer code reviews to ensure code quality and best practices.
  • Collaborated with the reporting team to design and develop appropriate data models for analytics.
  • Acted as a Business Intelligence (BI) , gathering requirements from stakeholders for developing the gold data layer.
  • Mentored junior developers and provided day-to-day technical guidance and support.
  • Technologies Used: ADF, azure blob, azure Databricks, sql server.

Sr . Data Engineer

Bizmetric India Pvt Ltd
01.2021 - 12.2022
  • Reed Exhibitions. Purpose of this project was to create a centralized datahub. Currently All the applications are hosted on various AWS accounts. Main intention of this project is to migrate all the applications to a centralized account. Apart from this inject data from various other sources to centralized data hub account.
  • Understanding existing architecture of the applications running in source environment.
  • Implemented required code in scala and python.
  • Developing and Migrating the same In the centralized account.
  • Updating existing code as per the centralized data architecture.
  • Creating cloud formation templates for migrated applications.
  • Create python scripts for glue jobs implementations.
  • Create /updating AWS glue jobs, create/update spark jobs to run on AWS EMR.
  • Creating/managing lake formation permissions for hosted services.
  • Design and engineer big data solutions, develop a modern data analytics lake.
  • Liaise with business team and technical leads, gather requirements ,identify data sources.
  • Technically design and develop distributed, high throughput, low latency data processing and data systems.
  • Technologies Used: AWS glue, emr, athena, lambda, cloudformation, s3, lambdafunctions, ec2, redshift. GCP – bigquery, dataproc

Big Data Engineer

Tata Consultancy services
03.2016 - 12.2020
  • Equifax Inc. is a consumer credit reporting agency in the United Kingdom, considered one of the three largest credit agencies along with Experian and TransUnion. Founded in 1899, Equifax is the oldest of the three agencies and gathers and maintains information on over 800 million consumers and more than 88 million businesses worldwide. Objective of project is to create a data lake. Having all the data on platform enables us to do analytics and product design. Later multiple existing products planned to migrate to Hadoop from Mainframe, SAS etc. We have almost 50 datasets which are loaded on frequency of monthly/daily/weekly/as available. The existing credit score calculation module is moved to Hadoop.
  • Writing pure Functional code specification as Apache spark jobs.
  • Loading different kinds of datasets into Hadoop cluster into Hive table as external tables.
  • Writing shell scripts to create dynamic external hive tables by reading variable sized header of CSV or Avro schema and to load data into created external hive tables.
  • Automated code release and rpm creation process through Go-CD pipelines and scheduled those jobs through pipelines.
  • Set up Go-CD pipeline for build and pull request code review, PR approvals.
  • Writing scripts for regression testing for above mentioned frameworks and scheduled those script runs through Go-CD pipelines.
  • Tracking defects using the JIRA.
  • Writing unit test cases using Scala Test/Flat spec.
  • Understanding the project flow and improving the processes for better quality and productivity.

Education

Bachelor of Engineering - Electronics & Telecommunications Engineer

Mumbai University
01-2015

Skills

  • Cloud Platform: AWS(AWS-EC2,S3, Athena, Cloudformation, glue,, Redshift, EMR) Azure Databricks, ADF, blob storage, azure sql
  • Database: MySQL, postgres
  • Programming Language: scala, Java, python
  • Big Data Knowledge: Hadoop, Sqoop, Hive, HDFS, Apache Spark, Kafka
  • Technical Proficiency: Big Data and Analytics
  • CI/CD: Git, Bitbucket, Gocd, Jenkins
  • Scheduling Tool: oozie, Airflow, databricks workflows

Certification

Databricks Certified Data Engineer Professional

Timeline

Principal Data Engineer

Murphy Oil Corporation
06.2025 - Current

Lead Data Engineer

Bizmetric India Pvt Ltd
06.2023 - 05.2025

Sr .data Engineer

Bizmetric India Pvt Ltd
12.2022 - 06.2023

Sr . Data Engineer

Bizmetric India Pvt Ltd
01.2021 - 12.2022

Big Data Engineer

Tata Consultancy services
03.2016 - 12.2020

Bachelor of Engineering - Electronics & Telecommunications Engineer

Mumbai University