Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Mounika Gorthi

Maineville,OH

Summary

  • 8+ years of total IT Experience, extensive experience in steering Big Data projects from inception to delivery, passionate about turning data into products, actionable insights, and meaningful stories.
  • Driven Big Data Development with experience performing complex integrations while developing code. Enthusiastic technical professional with a background in supporting administrators during configuration and deployment. Strong history of accuracy in a deadline-driven environment.
  • Excellent experience in the Application Development and Maintenance of projects using various technologies such as Python, Scala, Data Structures, UNIX shell scripting, etc.
  • Expertise in all Hadoop Ecosystem components- Hive, Hue, Pig, Sqoop, Impala, Flume, Zookeeper, Oozie, Airflow, and Apache Spark.
  • Expertise in Creating, Debugging, Scheduling, and Monitoring jobs using Airflow and Oozie.
  • Having sound experience in Big Data Hadoop Ecosystems experience in Ingestion, storage, querying, processing, and big data analysis.
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task.
  • Experience in Spark Streaming in order to ingest real-time data from multiple data sources into HDFS.
  • In-depth understanding of Snowflake as SaaS cloud technology.
  • In-depth knowledge of Snowflake Database, Schema, Table structures, Credit Usage, Multi-cluster Warehouses, Data Sharing, Stored Procedures, UDF's in Snowflake.
  • Expertise in designing clustered tables on the Snowflake database to improve query performance for consumers.
  • Experience in using Snowflake Clone and Time Travel.
  • Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, and Big Data model techniques using Python.
  • ETL pipelines in and out of data warehouse using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake

Overview

10
10
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

S&P GLOBAL
10.2022 - Current
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Designed and implemented effective database solutions and models to store and retrieve data.
  • Prepared documentation and analytic reports, delivering summarized results, analysis, and conclusions to stakeholders.
  • Developed various data loading strategies and performed various transformations for analyzing datasets by using Cloudera Distribution for the Hadoop ecosystem.
  • Worked extensively on designing and developing multiple Spark Scala ingestion pipelines, both Realtime and Batch.
  • Responsible for handling large datasets using Partitions, Spark in Memory capabilities,Broadcasts in Spark, Effective & efficient Joins,Transformations, and others during the
    ingestion process itself.
  • Developed generic stored procedures using Snow SQL and JavaScript to transform and ingest CDC data into Snowflake relational tables from external S3 stages.
  • Worked on Prototype to create the external function in Snowflake to call remote service implemented in AWS Lambda.
  • Developed multiple POCs using Spark and deployed them on the Yarn cluster, compared the Performance of Spark with Hive and Impala.
  • Developed generic store procedures using Snow SQL and JavaScript to transform and ingest CDC data into Snowflake relational tables via external stages on GCP Storage.
  • Created Snowpipe for continuous data load. Created data sharing between two snowflake accounts. Created internal and external stages and transformed data during load.
  • Redesigned the views in Snowflake to increase the performance.
  • Involved in testing Snowflake to understand the best possible way to use the cloud resources.
  • Developing Spark code in Python and Spark SQL environment for faster testing and processing of data, Loading data into Spark RDD, and doing In-memory computation to generate output response with less memory usage.

Big Data Engineer

Vanguard
11.2021 - 09.2022
  • Worked in complete big data flow of multiple applications, starting from data ingestion from upstream to HDFS, processing, and analyzing data in HDFS.
  • Developed various data loading strategies and performed various transformations for analyzing datasets by using Cloudera Distribution for the Hadoop ecosystem.
  • Worked on designing and developing multiple PySpark ingestion pipelines, both Realtime and Batch.
  • Developed multiple POCs using Spark and deployed them on the Yarn cluster, compared the Performance of Spark with Hive and Impala.
  • Worked on Batch processing for History load and Real-time data processing for consuming live data on Spark Streaming using Lambda architecture.
  • Developed a Streaming pipeline to consume data from Kafka and ingest it into HDFS in near real-time.
  • Worked on Performing tuning of Spark Streaming Applications for setting the right Batch Interval time, the correct level of Parallelism, and memory tuning.
  • Implemented Spark SQL optimized joins to gather data from different sources and run ad-hoc queries on top of them.
  • Worked on parsing and converting JSON/XML formatted files to tabular format in Hive/impalaby using PySpark, Spark SQL, and Dataframe API.
  • Worked on various file formats and compressions Text, Json, XML, Avro, Parquet file formats, snappy, Bzip2, Gzip compression.

Hadoop/Spark Developer

CARDINAL HEALTH
11.2020 - 11.2021
  • Exploring with Spark improving performance and optimizing existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, and Spark Yarn.
  • Worked on POC's with Apache Spark using Python to implement Spark in a project.
  • Build Scalable distributed data solutions using Hadoop Cloudera Distribution.
  • Load and transform large sets of structured, semi-structured, and unstructured data.
  • Implemented Partitioning, Dynamic Partitions, and Buckets in Hive.
  • Extending HIVE and PIG core functionality by using custom User Defined Functions (UDF), User Defined Table-Generating Functions (UDTF), and User Defined Aggregating Functions (UDAF) for Hive and Pig using Python.
  • Developed Hadoop streaming jobs to process terabytes of JSON/XML format data.
  • Developed complex MapReduce streaming jobs using Java language using Hive and Pig using MapReduce Programs using Java to perform various ETL, cleaning, and scrubbing tasks.
  • Designed an ETL-run performance tracking sheet in different phases of the project and shared it with the Production team.
  • Used Hive to analyze partitioned and bucketed data and compute various metrics for reporting & used hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
  • Demonstrated Hadoop practices and broad knowledge of technical solutions, design patterns, and code for medium/complex applications deployed in Hadoop production.
  • Inspected and analyzed existing Hadoop environments for proposed product launches, producing cost/benefit analyses to use included legacy assets.

Data Engineer

Tata Consultancy Srvices
04.2014 - 10.2018
  • Developed highly maintainable Hadoop code and followed all best practices regarding coding.
  • Analyzed and gathered business requirements specifications by interacting with client and understanding business requirement specification documents.
  • Met with key stakeholders to discuss and understand project scope, tasks required, and deadlines.
  • Installed and configured Hadoop MapReduce, HDFS, and developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Developed complex MapReduce streaming jobs using Java language using Hive and Pig and MapReduce Programs using Java to perform various ETL, cleaning, and scrubbing tasks.
  • Developing and running Map-Reduce Jobs on YARN and Hadoop clusters to produce daily and monthly reports per user's need.
  • Develop code in Hadoop technologies and perform Unit Testing.
  • Involved in creating Hive tables, loading structured data, and writing Hive queries, which will run internally in MapReduce.
  • Created PIG scripts to load, transform, and store data from various sources into HIVE metastore.
  • Worked in MySQL database on simple queries and wrote Stored Procedures for normalization and renormalization.
  • Installed and configured Hadoop MapReduce, HDFS, and developed multiple MapReduce jobs in Java for data cleaning and pre-processing.

Education

Bachelor of Science - Electrical And Instrumentation Engineering

JNTU
Anantapur, AP, India
05.2014

Skills

  • Cloud Technologies: GCP, Big query, AWS, EC2, S3, VPC, Lambda, Redshift, EMR, Azure, Azure Data Factory, Azure Blob Storage, Snowflake, Databricks
  • Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Hue, Sqoop, Storm, Kafka, Oozie, Airflow, Spark SQL, Spark Streaming, PySpark, Flume, Zookeeper, Cassandra, Spark, Cloudera, Delta Lake, Jupyter/Notebook, Zeppelin
  • Databases: Oracle, MySQL, SQL Server, DB2 for Mainframes, Familiar with NoSQL (HBase, Cassandra, MongoDB)
  • Scripting & Query Languages: Python, UNIX Shell scripting, SQL, SnowSql

Certification

Certifications:

· Successfully earned SnowPro Core certification.

· Successfully earned GCP Cloud Engineering Associate Certification.

Timeline

Senior Data Engineer

S&P GLOBAL
10.2022 - Current

Big Data Engineer

Vanguard
11.2021 - 09.2022

Hadoop/Spark Developer

CARDINAL HEALTH
11.2020 - 11.2021

Data Engineer

Tata Consultancy Srvices
04.2014 - 10.2018

Bachelor of Science - Electrical And Instrumentation Engineering

JNTU
Mounika Gorthi