Summary
Overview
Work History
Education
Skills
Personal Information
Languages
Disclaimer
Websites
Languages
Accomplishments
Certification
Timeline
Generic

Venkatasagar Mangapuri

Hicksville,NY

Summary

  • To work in an organization as a collaborative data engineering by making use of my substantial knowledge in executing and designing solutions for complex business problems involving large scale data warehousing real-time analytics and reporting solutions. Known for using right tools when and where they make sense and create intuitive architecture that helps organization effectively analyze and process terabytes of structured and unstructured data.
  • An IT professional with around 5+ years of experience in Software Development and implementation of BigData and Big Data related technologies.
  • Efficient on working with Big Data and Hadoop Distributed File System (HDFS).
  • Efficient on working with AWS EMR (Hadoop and Spark), S3, Athena, Snowflake.
  • In-depth understanding of Apache- Spark Architecture and performed several batch operations using Spark (Core, SQL), RDDs, Data frames.
  • Implemented Spark using Python and Spark SQL for faster testing and processing of data.
  • Expertise in Python programming.
  • Experience in working with Eco systems like Hive.
  • Performed Hive operations on large datasets with proficiency in writing HiveQL queries using transactional and performance efficient concepts: Partitioning, Bucketing, efficient and effective Join operations.
  • Experienced with different file formats like Parquet, ORC, Avro, Sequence, CSV, JSON, Textfiles.
  • Scheduled jobs and automated workflows using Cloud watch, Control-m.
  • Experience in Agile Development and Scrum process.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Engineer

IBM
Hyderabad
01.2022 - 03.2023
  • Client: American Express - The American Express Company (Amex) is an American multinational corporation specialized in payment card services headquartered at 200 Vesey Street in the Battery Park City neighborhood of Lower Manhattan in New York City
  • The company was founded in 1850 and is one of the 30 components of the Dow Jones Industrial Average
  • The company's logo, adopted in 1958, is a gladiator or centurion whose image appears on the company's well-known traveler's cheques, charge cards, and credit cards
  • Project Description: Export blue - Global taxation: Report generation of global audit data related to tax which contains transactions and merchant's data using AMEX credit cards
  • Responsibilities: Designing report generation apps using Pyspark and formatting the data as per the business requirement
  • Building tax related apps with Pyspark which are being used by business
  • Automation of the Pyspark scripts for retrieval of yearly audit data
  • Worked on Hive in creation of external tables in HDFS
  • Involved in Tuning the Pyspark scripts
  • Environment: Pyspark (2.4.6), Python(3.7), Cloudera, Hive, Event Engine, Agile, Rally

Sr. Software Engineer

Legato Health Technologies
Hyderabad
03.2021 - 01.2022
  • Company Overview: Legato Health Technologies is a healthcare technology company that provides solutions to improve healthcare delivery and outcomes
  • Client: Anthem - Anthem is the second largest health care insurance provider in US, It is the largest for-profit managed health care company in the Blue Cross and Blue Shield Association
  • Project Description: The Client Information Insights: Discover (CII: Discover) is an interactive reporting tool with modules that allow users to visualize trends across the population spectrum of engaged and non-engaged members
  • The tool integrates metrics from claims, clinical programs, Cost of Care, and Utilization Management, as well as, Financial Cost and Utilization, Provider Networks and Specialty Services such as pharmacy
  • Responsibilities: Building glue jobs using python for RDS data manipulation based on the business requirement
  • Converting traditional SQL stored procedures to Pyspark jobs to increase the performance and provide better scalability
  • Involved in Performance tuning of various mappings and sessions to increase the performance
  • Used Pyspark to load data from one environment to other environment
  • Created Hive tables and involved in data loading
  • Efficient in writing UDFs in Pyspark
  • Designed Spark Applications for ETL
  • Designed Spark applications to do further data transformations on various data sets and dataframes
  • Creating External tables in Hive pointing to HDFS or S3 Location
  • Worked on various AWS cloud services like lambda, Step functions, EMR, S3
  • Worked on Snowflake data for faster querying
  • Environment: Pyspark (2.4.6), Python (3.7), Glue, Athena, AWS S3, EMR, Snowflake, RDS, Lambda, Agile
  • Legato Health Technologies is a healthcare technology company that provides solutions to improve healthcare delivery and outcomes

Software Engineer

Cyient Pvt Ltd
Hyderabad
01.2019 - 02.2021
  • Project Description: Brook stone is a chain of retail stores in the United States and China
  • The objective of the project is to collect, consolidate and analyze marketing data coming from multiple channels
  • The results from the analytics are used to build various predictive models for purchase propensity of the customer
  • Responsibilities: Possess in-depth knowledge of Hadoop Architecture and its components: HDFS, Name Node, Data Node, Resource Manager, and Node Manager
  • Extracted data from S3 buckets and transformed using Spark on EMR cluster
  • Loaded the results from transformed data to another S3 bucket
  • Analyzed, manipulated large datasets to find data patterns and insights to make better business decisions
  • Worked On Spark, Spark Performance Tuning, Spark Transformation and Actions
  • Worked on Spark Cache and Persist in Memory Computation
  • Worked on different files like CSV, JSON, ORC, Parquets File formats
  • Analyzed data using Hadoop components: Hive, Spark Data frames
  • Scripted complex HiveQL queries on Hive tables to analyze large datasets
  • Created Hive partitions, buckets, external and managed tables and more
  • Responsible for creating Hive tables, loading the structured data into the tables and writing hive queries to further analysis the logs to identify issues and behavioral patterns
  • Environment: AWS, Spark, Scala, Hive, Hadoop, EMR, S3, Athena, Agile

Associate Software Engineer

Cyient Pvt Ltd
Hyderabad
01.2018 - 01.2019
  • Project Description: Amerigroup strives to meet the customer needs for health plan products and services
  • In doing so, Amerigroup work with its members and providers to make health care accessible, affordable and a means by which members improve their health
  • We process large amounts of data to increase insurance sales, process claims summary and make possible of data-oriented decision making
  • Responsibilities: Created a data lake which will embrace the existing history data from OLAP databases and to suffice the need to process the transactional data and coordinated with the data modelers to create Hive tables
  • Migration of ETL processes from Oracle to Hive to test the faster and easy data manipulation
  • Performed Data transformations in Pig & Hive and used partitions, buckets for performance improvements
  • Involved in analyzing system failures, identifying root causes and recommended a course of actions
  • Worked on Hive for exposing data for further analysis and for generating transformation files from different analytical formats to text files
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Load and transform large sets of structured, semi-structured and unstructured data
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analysis the logs to identify issues and behavioral patterns
  • Worked with support teams and resolved operational & performance issues
  • Solved performance issues in Hive and Pig with an understanding of Joins, Group, and aggregation and transfer to Map Reduce
  • Configured Oozie workflow to run multiple Hive jobs which run independently with time
  • Environment: Cloudera, HDFS, Map Reduce, Hive, Sqoop, Pig, Java, Oracle 11g, Oozie, SQL, Centos

Education

Master's - IT Project Management

St. Francis College
Brooklyn
12.2024

PG Diploma in Data Science (Online) - Information Technology

Manipal Academy of Higher Education
Bangalore
06-2020

B.tech - Computer Science and Engineering

JNTU
Hyderabad
01.2017

Skills

  • AWS
  • Cloudera (CDH4, CDH5, CDH 634)
  • Hadoop
  • Pyspark (Core, SQL)
  • HDFS
  • Hive
  • Linux/UNIX
  • Python
  • SQL server
  • Pycharm
  • IntelliJ Idea
  • Eclipse
  • Shell Scripting
  • Python Programming
  • Deep learning
  • Statistical Analysis
  • Machine learning

Personal Information

  • Father's Name: M Gopi
  • Date of Birth: 06/06/95
  • Nationality: Indian

Languages

  • Telugu
  • English
  • Hindi

Disclaimer

I hereby declare that the above information corrects to best of my abilities and knowledge.

Languages

English
Professional

Accomplishments

  • Have good knowledge in statistics and Machine Learning models to predict the Data insights, Data Scraping and Deep learning techniques like NLP.
  • 5-star rating for Python and Sql in Hackerrank which is a well-known website to solve problems using programming languages
  • Arctic Code Vault Contributor : It is a data repository that stores a snapshot of every active public GitHub repository from February 2, 2020

Certification

  • PG Diploma In Data Science from Manipal University which is a top university in India.

Timeline

Data Engineer

IBM
01.2022 - 03.2023

Sr. Software Engineer

Legato Health Technologies
03.2021 - 01.2022

Software Engineer

Cyient Pvt Ltd
01.2019 - 02.2021

Associate Software Engineer

Cyient Pvt Ltd
01.2018 - 01.2019

Master's - IT Project Management

St. Francis College

PG Diploma in Data Science (Online) - Information Technology

Manipal Academy of Higher Education

B.tech - Computer Science and Engineering

JNTU
Venkatasagar Mangapuri