Summary

Overview

Work History

Education

Skills

Personal Information

Languages

Disclaimer

Websites

Languages

Accomplishments

Certification

Timeline

Venkatasagar Mangapuri

Hicksville,NY

Summary

To work in an organization as a collaborative data engineering by making use of my substantial knowledge in executing and designing solutions for complex business problems involving large scale data warehousing real-time analytics and reporting solutions. Known for using right tools when and where they make sense and create intuitive architecture that helps organization effectively analyze and process terabytes of structured and unstructured data.
An IT professional with around 5+ years of experience in Software Development and implementation of BigData and Big Data related technologies.
Efficient on working with Big Data and Hadoop Distributed File System (HDFS).
Efficient on working with AWS EMR (Hadoop and Spark), S3, Athena, Snowflake.
In-depth understanding of Apache- Spark Architecture and performed several batch operations using Spark (Core, SQL), RDDs, Data frames.
Implemented Spark using Python and Spark SQL for faster testing and processing of data.
Expertise in Python programming.
Experience in working with Eco systems like Hive.
Performed Hive operations on large datasets with proficiency in writing HiveQL queries using transactional and performance efficient concepts: Partitioning, Bucketing, efficient and effective Join operations.
Experienced with different file formats like Parquet, ORC, Avro, Sequence, CSV, JSON, Textfiles.
Scheduled jobs and automated workflows using Cloud watch, Control-m.
Experience in Agile Development and Scrum process.

Overview

years of professional experience

Certification

Work History

Data Engineer

IBM

Hyderabad

01.2022 - 03.2023

Client: American Express - The American Express Company (Amex) is an American multinational corporation specialized in payment card services headquartered at 200 Vesey Street in the Battery Park City neighborhood of Lower Manhattan in New York City
The company was founded in 1850 and is one of the 30 components of the Dow Jones Industrial Average
The company's logo, adopted in 1958, is a gladiator or centurion whose image appears on the company's well-known traveler's cheques, charge cards, and credit cards
Project Description: Export blue - Global taxation: Report generation of global audit data related to tax which contains transactions and merchant's data using AMEX credit cards
Responsibilities: Designing report generation apps using Pyspark and formatting the data as per the business requirement
Building tax related apps with Pyspark which are being used by business
Automation of the Pyspark scripts for retrieval of yearly audit data
Worked on Hive in creation of external tables in HDFS
Involved in Tuning the Pyspark scripts
Environment: Pyspark (2.4.6), Python(3.7), Cloudera, Hive, Event Engine, Agile, Rally

Sr. Software Engineer

Legato Health Technologies

Hyderabad

03.2021 - 01.2022

Company Overview: Legato Health Technologies is a healthcare technology company that provides solutions to improve healthcare delivery and outcomes
Client: Anthem - Anthem is the second largest health care insurance provider in US, It is the largest for-profit managed health care company in the Blue Cross and Blue Shield Association
Project Description: The Client Information Insights: Discover (CII: Discover) is an interactive reporting tool with modules that allow users to visualize trends across the population spectrum of engaged and non-engaged members
The tool integrates metrics from claims, clinical programs, Cost of Care, and Utilization Management, as well as, Financial Cost and Utilization, Provider Networks and Specialty Services such as pharmacy
Responsibilities: Building glue jobs using python for RDS data manipulation based on the business requirement
Converting traditional SQL stored procedures to Pyspark jobs to increase the performance and provide better scalability
Involved in Performance tuning of various mappings and sessions to increase the performance
Used Pyspark to load data from one environment to other environment
Created Hive tables and involved in data loading
Efficient in writing UDFs in Pyspark
Designed Spark Applications for ETL
Designed Spark applications to do further data transformations on various data sets and dataframes
Creating External tables in Hive pointing to HDFS or S3 Location
Worked on various AWS cloud services like lambda, Step functions, EMR, S3
Worked on Snowflake data for faster querying
Environment: Pyspark (2.4.6), Python (3.7), Glue, Athena, AWS S3, EMR, Snowflake, RDS, Lambda, Agile
Legato Health Technologies is a healthcare technology company that provides solutions to improve healthcare delivery and outcomes

Software Engineer

Cyient Pvt Ltd

Hyderabad

01.2019 - 02.2021

Project Description: Brook stone is a chain of retail stores in the United States and China
The objective of the project is to collect, consolidate and analyze marketing data coming from multiple channels
The results from the analytics are used to build various predictive models for purchase propensity of the customer
Responsibilities: Possess in-depth knowledge of Hadoop Architecture and its components: HDFS, Name Node, Data Node, Resource Manager, and Node Manager
Extracted data from S3 buckets and transformed using Spark on EMR cluster
Loaded the results from transformed data to another S3 bucket
Analyzed, manipulated large datasets to find data patterns and insights to make better business decisions
Worked On Spark, Spark Performance Tuning, Spark Transformation and Actions
Worked on Spark Cache and Persist in Memory Computation
Worked on different files like CSV, JSON, ORC, Parquets File formats
Analyzed data using Hadoop components: Hive, Spark Data frames
Scripted complex HiveQL queries on Hive tables to analyze large datasets
Created Hive partitions, buckets, external and managed tables and more
Responsible for creating Hive tables, loading the structured data into the tables and writing hive queries to further analysis the logs to identify issues and behavioral patterns
Environment: AWS, Spark, Scala, Hive, Hadoop, EMR, S3, Athena, Agile

Associate Software Engineer

Cyient Pvt Ltd

Hyderabad

01.2018 - 01.2019

Project Description: Amerigroup strives to meet the customer needs for health plan products and services
In doing so, Amerigroup work with its members and providers to make health care accessible, affordable and a means by which members improve their health
We process large amounts of data to increase insurance sales, process claims summary and make possible of data-oriented decision making
Responsibilities: Created a data lake which will embrace the existing history data from OLAP databases and to suffice the need to process the transactional data and coordinated with the data modelers to create Hive tables
Migration of ETL processes from Oracle to Hive to test the faster and easy data manipulation
Performed Data transformations in Pig & Hive and used partitions, buckets for performance improvements
Involved in analyzing system failures, identifying root causes and recommended a course of actions
Worked on Hive for exposing data for further analysis and for generating transformation files from different analytical formats to text files
Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Load and transform large sets of structured, semi-structured and unstructured data
Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analysis the logs to identify issues and behavioral patterns
Worked with support teams and resolved operational & performance issues
Solved performance issues in Hive and Pig with an understanding of Joins, Group, and aggregation and transfer to Map Reduce
Configured Oozie workflow to run multiple Hive jobs which run independently with time
Environment: Cloudera, HDFS, Map Reduce, Hive, Sqoop, Pig, Java, Oracle 11g, Oozie, SQL, Centos

Education

Master's - IT Project Management

St. Francis College

Brooklyn

12.2024

PG Diploma in Data Science (Online) - Information Technology

Manipal Academy of Higher Education

Bangalore

06-2020

B.tech - Computer Science and Engineering

JNTU

Hyderabad

01.2017

Skills

AWS
Cloudera (CDH4, CDH5, CDH 634)
Hadoop
Pyspark (Core, SQL)
HDFS
Hive
Linux/UNIX
Python
SQL server

Pycharm
IntelliJ Idea
Eclipse
Shell Scripting
Python Programming
Deep learning
Statistical Analysis
Machine learning

Personal Information

Father's Name: M Gopi
Date of Birth: 06/06/95
Nationality: Indian

Languages

Telugu
English
Hindi

Disclaimer

I hereby declare that the above information corrects to best of my abilities and knowledge.

Websites

https://www.linkedin.com/in/venkatsagar-m-62939325a/

Languages

English

Professional

Accomplishments

Have good knowledge in statistics and Machine Learning models to predict the Data insights, Data Scraping and Deep learning techniques like NLP.
5-star rating for Python and Sql in Hackerrank which is a well-known website to solve problems using programming languages
Arctic Code Vault Contributor : It is a data repository that stores a snapshot of every active public GitHub repository from February 2, 2020

Certification

PG Diploma In Data Science from Manipal University which is a top university in India.

Timeline

Data Engineer

IBM

01.2022 - 03.2023

Sr. Software Engineer

Legato Health Technologies

03.2021 - 01.2022

Software Engineer

Cyient Pvt Ltd

01.2019 - 02.2021

Associate Software Engineer

Cyient Pvt Ltd

01.2018 - 01.2019

Master's - IT Project Management

St. Francis College

PG Diploma in Data Science (Online) - Information Technology

Manipal Academy of Higher Education

B.tech - Computer Science and Engineering

JNTU

Venkatasagar Mangapuri

Summary

Overview

Work History

Data Engineer

Sr. Software Engineer

Software Engineer

Associate Software Engineer

Education

Master's - IT Project Management

PG Diploma in Data Science (Online) - Information Technology

B.tech - Computer Science and Engineering

Skills

Personal Information

Languages

Disclaimer

Websites

Languages

Accomplishments

Certification

Timeline

Data Engineer

Sr. Software Engineer

Software Engineer

Associate Software Engineer

Master's - IT Project Management

PG Diploma in Data Science (Online) - Information Technology

B.tech - Computer Science and Engineering

Similar Profiles

RAJKUMAR PRAJKUMAR P

Ebony McRaeEbony McRae

Vinothkumar KVinothkumar K

Amit LingeAmit Linge

Mohammad AazemMohammad Aazem