Summary
Overview
Work History
Education
Timeline
Generic

Manohar Rao Ponugoti

Overland Park,KS

Summary

Experienced Data Engineer with 1.8 years of a strong background in building and optimizing Big Data applications using Hadoop ecosystem technologies like HDFS, Hive, Sqoop, and Apache Spark. I specialize in designing scalable data pipelines and automating data workflows, ensuring efficient processing and seamless integration with cloud platforms like AWS. My focus is on delivering high-performance solutions and maintaining reliability in production environments.

Overview

2
2
years of professional experience

Work History

Big Data Engineer

Tata Consultancy services
Hyderabad, Telangana
12.2021 - 12.2022
  • I crafted and fine-tuned intricate Hive queries to effectively handle massive amounts of data.
  • I have integrated Hive with other big data technologies, such as Hadoop and Spark, in order to optimize data processing workflows.
  • Designed and managed Extract, Transform, Load (ETL) processes using Hive, leading to improved data consistency and accuracy.
  • Managed EMR cluster configurations and scaling based on workload requirements.
  • Leveraged Spark RDDs (Resilient Distributed Datasets) for low-level data processing tasks.
  • Integrated Spark with AWS Glue for automated ETL pipelines and schema evolution.
  • Integrated Spark with AWS Lambda for serverless data processing solutions.
  • Created lambda functions in AWS to run ECS containers.
  • Experienced in designing and implementing complex data integration solutions using Sqoop.
  • Proficient in using Sqoop to import and export data with complex schemas and data types.
  • Implemented Sqoop-based solutions to load data from external databases into Hadoop clusters in real-time.
  • Proficient in using Sqoop to automate data migrations between Hadoop clusters in different geographical regions.
  • Implemented data partitioning and shuffling strategies for optimization.
  • Created custom Spark applications for specific business use cases.
  • Optimized Spark jobs and data processing workflows for scalability, performance, and cost efficiency using techniques such as partitioning, compression, and caching.
  • Experienced in optimizing Spark SQL performance by tuning various configuration settings, such as memory allocation, caching, and serialization.
  • Expertise in using Spark SQL to process large-scale structured and semi-structured data sets, including querying, filtering, mapping, reducing, grouping, and aggregating data.
  • Proficient in managing and optimizing data storage solutions using Google Cloud Storage, ensuring efficient data organization, access, and security.
  • Experienced in deploying and managing data processing clusters with Google Dataproc, leveraging its scalability and automation features for large-scale data analysis.
  • Hands-on experience with managing Google Compute Engine instances, including image creation, network configuration, and instance scaling.
  • Strong knowledge of Google Cloud Functions triggers and bindings for seamless integration with various event-driven workflows.

Data Engineer Intern

Magnibot Technology solutions India Pvt Ltd
Bangalore
10.2020 - 08.2021
  • Integrated Spark with external data sources like JDBC and APIs for data extraction.
  • Integrated Spark with Hadoop ecosystems like HDFS and Hive for data storage and querying.
  • Collaborated with data engineers to design and optimize Spark data pipelines.
  • Familiarity with Spark Data Frame schema and data type operations, such as adding, renaming, and dropping columns, casting data types, and handling null values.
  • Knowledge of Spark Data Frame optimization techniques, such as predicate pushdown, column pruning, and vectorized execution, and their impact on query performance and resource utilization.
  • Designed and developed batch processing data pipelines on Amazon EMR using Apache Spark, Python, and Scala to process terabytes of data in a cost-effective and scalable manner.
  • I was involved in working on data analysis, data quality, and data profiling to support the business and assist the business team.
  • Worked with Spark's data serialization formats (Avro, Parquet, JSON, etc.).
  • Utilized Spark for log parsing and parsing unstructured data.
  • Designed and optimized Spark jobs for data deduplication.
  • Maintained and monitored Spark clusters on AWS EMR, ensuring high availability and fault tolerance.
  • Automated infrastructure provisioning and management on Google Compute Engine using Infrastructure as Code (IAC) tools like Terraform.
  • Developed serverless, event-driven workflows on Google Cloud Functions, streamlining data processing and reducing infrastructure complexity.
  • Developed serverless applications on Google Cloud Functions, leveraging event-driven architecture for real-time data processing and automation.
  • Proficient in Google Cloud Storage's versioning and object archiving features, ensuring data retention and compliance with data governance policies.

Education

Master of Science - Big Data Analytics & Information Technology

University of Central Missouri
Warrensburg, MO
03-2024

Timeline

Big Data Engineer

Tata Consultancy services
12.2021 - 12.2022

Data Engineer Intern

Magnibot Technology solutions India Pvt Ltd
10.2020 - 08.2021

Master of Science - Big Data Analytics & Information Technology

University of Central Missouri
Manohar Rao Ponugoti