8+ years of total IT Experience, extensive experience in steering Big Data projects from inception to delivery, passionate about turning data into products, actionable insights, and meaningful stories.
Driven Big Data Development with experience performing complex integrations while developing code. Enthusiastic technical professional with a background in supporting administrators during configuration and deployment. Strong history of accuracy in a deadline-driven environment.
Excellent experience in the Application Development and Maintenance of projects using various technologies such as Python, Scala, Data Structures, UNIX shell scripting, etc.
Expertise in all Hadoop Ecosystem components- Hive, Hue, Pig, Sqoop, Impala, Flume, Zookeeper, Oozie, Airflow, and Apache Spark.
Expertise in Creating, Debugging, Scheduling, and Monitoring jobs using Airflow and Oozie.
Having sound experience in Big Data Hadoop Ecosystems experience in Ingestion, storage, querying, processing, and big data analysis.
Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task.
Experience in Spark Streaming in order to ingest real-time data from multiple data sources into HDFS.
In-depth understanding of Snowflake as SaaS cloud technology.
In-depth knowledge of Snowflake Database, Schema, Table structures, Credit Usage, Multi-cluster Warehouses, Data Sharing, Stored Procedures, UDF's in Snowflake.
Expertise in designing clustered tables on the Snowflake database to improve query performance for consumers.
Experience in using Snowflake Clone and Time Travel.
Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, and Big Data model techniques using Python.
ETL pipelines in and out of data warehouse using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake
Overview
10
10
years of professional experience
1
1
Certification
Work History
Senior Data Engineer
S&P GLOBAL
10.2022 - Current
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Designed and implemented effective database solutions and models to store and retrieve data.
Prepared documentation and analytic reports, delivering summarized results, analysis, and conclusions to stakeholders.
Developed various data loading strategies and performed various transformations for analyzing datasets by using Cloudera Distribution for the Hadoop ecosystem.
Worked extensively on designing and developing multiple Spark Scala ingestion pipelines, both Realtime and Batch.
Responsible for handling large datasets using Partitions, Spark in Memory capabilities,Broadcasts in Spark, Effective & efficient Joins,Transformations, and others during the
ingestion process itself.
Developed generic stored procedures using Snow SQL and JavaScript to transform and ingest CDC data into Snowflake relational tables from external S3 stages.
Worked on Prototype to create the external function in Snowflake to call remote service implemented in AWS Lambda.
Developed multiple POCs using Spark and deployed them on the Yarn cluster, compared the Performance of Spark with Hive and Impala.
Developed generic store procedures using Snow SQL and JavaScript to transform and ingest CDC data into Snowflake relational tables via external stages on GCP Storage.
Created Snowpipe for continuous data load. Created data sharing between two snowflake accounts. Created internal and external stages and transformed data during load.
Redesigned the views in Snowflake to increase the performance.
Involved in testing Snowflake to understand the best possible way to use the cloud resources.
Developing Spark code in Python and Spark SQL environment for faster testing and processing of data, Loading data into Spark RDD, and doing In-memory computation to generate output response with less memory usage.
Big Data Engineer
Vanguard
11.2021 - 09.2022
Worked in complete big data flow of multiple applications, starting from data ingestion from upstream to HDFS, processing, and analyzing data in HDFS.
Developed various data loading strategies and performed various transformations for analyzing datasets by using Cloudera Distribution for the Hadoop ecosystem.
Worked on designing and developing multiple PySpark ingestion pipelines, both Realtime and Batch.
Developed multiple POCs using Spark and deployed them on the Yarn cluster, compared the Performance of Spark with Hive and Impala.
Worked on Batch processing for History load and Real-time data processing for consuming live data on Spark Streaming using Lambda architecture.
Developed a Streaming pipeline to consume data from Kafka and ingest it into HDFS in near real-time.
Worked on Performing tuning of Spark Streaming Applications for setting the right Batch Interval time, the correct level of Parallelism, and memory tuning.
Implemented Spark SQL optimized joins to gather data from different sources and run ad-hoc queries on top of them.
Worked on parsing and converting JSON/XML formatted files to tabular format in Hive/impalaby using PySpark, Spark SQL, and Dataframe API.
Worked on various file formats and compressions Text, Json, XML, Avro, Parquet file formats, snappy, Bzip2, Gzip compression.
Hadoop/Spark Developer
CARDINAL HEALTH
11.2020 - 11.2021
Exploring with Spark improving performance and optimizing existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, and Spark Yarn.
Worked on POC's with Apache Spark using Python to implement Spark in a project.
Build Scalable distributed data solutions using Hadoop Cloudera Distribution.
Load and transform large sets of structured, semi-structured, and unstructured data.
Implemented Partitioning, Dynamic Partitions, and Buckets in Hive.
Extending HIVE and PIG core functionality by using custom User Defined Functions (UDF), User Defined Table-Generating Functions (UDTF), and User Defined Aggregating Functions (UDAF) for Hive and Pig using Python.
Developed Hadoop streaming jobs to process terabytes of JSON/XML format data.
Developed complex MapReduce streaming jobs using Java language using Hive and Pig using MapReduce Programs using Java to perform various ETL, cleaning, and scrubbing tasks.
Designed an ETL-run performance tracking sheet in different phases of the project and shared it with the Production team.
Used Hive to analyze partitioned and bucketed data and compute various metrics for reporting & used hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
Demonstrated Hadoop practices and broad knowledge of technical solutions, design patterns, and code for medium/complex applications deployed in Hadoop production.
Inspected and analyzed existing Hadoop environments for proposed product launches, producing cost/benefit analyses to use included legacy assets.
Data Engineer
Tata Consultancy Srvices
04.2014 - 10.2018
Developed highly maintainable Hadoop code and followed all best practices regarding coding.
Analyzed and gathered business requirements specifications by interacting with client and understanding business requirement specification documents.
Met with key stakeholders to discuss and understand project scope, tasks required, and deadlines.
Installed and configured Hadoop MapReduce, HDFS, and developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
Developed complex MapReduce streaming jobs using Java language using Hive and Pig and MapReduce Programs using Java to perform various ETL, cleaning, and scrubbing tasks.
Developing and running Map-Reduce Jobs on YARN and Hadoop clusters to produce daily and monthly reports per user's need.
Develop code in Hadoop technologies and perform Unit Testing.
Involved in creating Hive tables, loading structured data, and writing Hive queries, which will run internally in MapReduce.
Created PIG scripts to load, transform, and store data from various sources into HIVE metastore.
Worked in MySQL database on simple queries and wrote Stored Procedures for normalization and renormalization.
Installed and configured Hadoop MapReduce, HDFS, and developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
Education
Bachelor of Science - Electrical And Instrumentation Engineering