To work in an organization as a collaborative data engineering by making use of my substantial knowledge in executing and designing solutions for complex business problems involving large scale data warehousing real-time analytics and reporting solutions. Known for using right tools when and where they make sense and create intuitive architecture that helps organization effectively analyze and process terabytes of structured and unstructured data.
An IT professional with around 5+ years of experience in Software Development and implementation of BigData and Big Data related technologies.
Efficient on working with Big Data and Hadoop Distributed File System (HDFS).
Efficient on working with AWS EMR (Hadoop and Spark), S3, Athena, Snowflake.
In-depth understanding of Apache- Spark Architecture and performed several batch operations using Spark (Core, SQL), RDDs, Data frames.
Implemented Spark using Python and Spark SQL for faster testing and processing of data.
Expertise in Python programming.
Experience in working with Eco systems like Hive.
Performed Hive operations on large datasets with proficiency in writing HiveQL queries using transactional and performance efficient concepts: Partitioning, Bucketing, efficient and effective Join operations.
Experienced with different file formats like Parquet, ORC, Avro, Sequence, CSV, JSON, Textfiles.
Scheduled jobs and automated workflows using Cloud watch, Control-m.
Experience in Agile Development and Scrum process.
Overview
5
5
years of professional experience
1
1
Certification
Work History
Data Engineer
IBM
Hyderabad
01.2022 - 03.2023
Client: American Express - The American Express Company (Amex) is an American multinational corporation specialized in payment card services headquartered at 200 Vesey Street in the Battery Park City neighborhood of Lower Manhattan in New York City
The company was founded in 1850 and is one of the 30 components of the Dow Jones Industrial Average
The company's logo, adopted in 1958, is a gladiator or centurion whose image appears on the company's well-known traveler's cheques, charge cards, and credit cards
Project Description: Export blue - Global taxation: Report generation of global audit data related to tax which contains transactions and merchant's data using AMEX credit cards
Responsibilities: Designing report generation apps using Pyspark and formatting the data as per the business requirement
Building tax related apps with Pyspark which are being used by business
Automation of the Pyspark scripts for retrieval of yearly audit data
Worked on Hive in creation of external tables in HDFS
Company Overview: Legato Health Technologies is a healthcare technology company that provides solutions to improve healthcare delivery and outcomes
Client: Anthem - Anthem is the second largest health care insurance provider in US, It is the largest for-profit managed health care company in the Blue Cross and Blue Shield Association
Project Description: The Client Information Insights: Discover (CII: Discover) is an interactive reporting tool with modules that allow users to visualize trends across the population spectrum of engaged and non-engaged members
The tool integrates metrics from claims, clinical programs, Cost of Care, and Utilization Management, as well as, Financial Cost and Utilization, Provider Networks and Specialty Services such as pharmacy
Responsibilities: Building glue jobs using python for RDS data manipulation based on the business requirement
Converting traditional SQL stored procedures to Pyspark jobs to increase the performance and provide better scalability
Involved in Performance tuning of various mappings and sessions to increase the performance
Used Pyspark to load data from one environment to other environment
Created Hive tables and involved in data loading
Efficient in writing UDFs in Pyspark
Designed Spark Applications for ETL
Designed Spark applications to do further data transformations on various data sets and dataframes
Creating External tables in Hive pointing to HDFS or S3 Location
Worked on various AWS cloud services like lambda, Step functions, EMR, S3
Legato Health Technologies is a healthcare technology company that provides solutions to improve healthcare delivery and outcomes
Software Engineer
Cyient Pvt Ltd
Hyderabad
01.2019 - 02.2021
Project Description: Brook stone is a chain of retail stores in the United States and China
The objective of the project is to collect, consolidate and analyze marketing data coming from multiple channels
The results from the analytics are used to build various predictive models for purchase propensity of the customer
Responsibilities: Possess in-depth knowledge of Hadoop Architecture and its components: HDFS, Name Node, Data Node, Resource Manager, and Node Manager
Extracted data from S3 buckets and transformed using Spark on EMR cluster
Loaded the results from transformed data to another S3 bucket
Analyzed, manipulated large datasets to find data patterns and insights to make better business decisions
Worked On Spark, Spark Performance Tuning, Spark Transformation and Actions
Worked on Spark Cache and Persist in Memory Computation
Worked on different files like CSV, JSON, ORC, Parquets File formats
Analyzed data using Hadoop components: Hive, Spark Data frames
Scripted complex HiveQL queries on Hive tables to analyze large datasets
Created Hive partitions, buckets, external and managed tables and more
Responsible for creating Hive tables, loading the structured data into the tables and writing hive queries to further analysis the logs to identify issues and behavioral patterns
Project Description: Amerigroup strives to meet the customer needs for health plan products and services
In doing so, Amerigroup work with its members and providers to make health care accessible, affordable and a means by which members improve their health
We process large amounts of data to increase insurance sales, process claims summary and make possible of data-oriented decision making
Responsibilities: Created a data lake which will embrace the existing history data from OLAP databases and to suffice the need to process the transactional data and coordinated with the data modelers to create Hive tables
Migration of ETL processes from Oracle to Hive to test the faster and easy data manipulation
Performed Data transformations in Pig & Hive and used partitions, buckets for performance improvements
Involved in analyzing system failures, identifying root causes and recommended a course of actions
Worked on Hive for exposing data for further analysis and for generating transformation files from different analytical formats to text files
Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Load and transform large sets of structured, semi-structured and unstructured data
Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analysis the logs to identify issues and behavioral patterns
Worked with support teams and resolved operational & performance issues
Solved performance issues in Hive and Pig with an understanding of Joins, Group, and aggregation and transfer to Map Reduce
Configured Oozie workflow to run multiple Hive jobs which run independently with time