Summary
Overview
Work History
Education
Skills
Timeline
Generic

Bala Jyothi Thumma

Jersey City,NJ

Summary

Dynamic Data Engineer with hands-on experience at Amigos Software Solutions, specializing in designing optimized ETL pipelines and real-time data ingestion using Hadoop and Kafka. Proven ability to enhance data quality through rigorous validation, while collaborating with cross-functional teams. Strong problem-solving skills and expertise in Python and SQL for data manipulation and process optimization. I possess extensive experience in big data technologies, including Hadoop, Kafka, Sqoop, and Oozie, which I have used for large-scale data processing and real-time streaming. I am proficient in working with HDFS, MapReduce, and Hive, ensuring efficient data storage and querying within big data environments. In data engineering and ETL processes, I have designed and optimized ETL pipelines using tools like Informatica, IBM InfoSphere DataStage, Python, and SQL, ensuring smooth data transformation and integration across various systems.

I have hands-on experience with leading cloud technologies such as AWS (including services like S3, Redshift, and EC2) and GCP, which I’ve leveraged to provide scalable data storage, processing, and analytics solutions. My expertise also extends to data storage and management, where I have worked extensively with both SQL and NoSQL databases, including Oracle, DB2, MongoDB, and Netezza, optimizing data management processes and ensuring efficient data processing.

I am highly skilled in data validation and quality assurance, utilizing complex SQL queries to validate and troubleshoot data quality issues, ensuring the integrity and accuracy of data across various environments. In programming and scripting, I am proficient in Python and SQL, using these languages to manipulate data, automate workflows, and develop seamless data pipelines.

My experience in data visualization includes using Tableau to visualize complex datasets, helping to provide actionable insights through intuitive dashboards and reports. I also have experience in DevOps and automation, having worked with Git, Jenkins, and CI/CD pipelines to automate workflows, manage version control, and streamline data operations.

I excel at performance optimization, focusing on improving the efficiency, speed, and reliability of data systems, while ensuring cost-efficiency in cloud data processing. I am committed to continuous learning and innovation, always staying up-to-date with emerging technologies in data engineering, and I am passionate about improving processes through the application of cloud technologies and big data frameworks.

Overview

1
1
year of professional experience

Work History

Data Engineer

Amigos Software Solutions
02.2023 - 06.2023
  • Designed and Optimized ETL Pipelines: Created efficient ETL processes using Informatica to load data from flat files and Excel into Oracle Data Warehouses, ensuring seamless integration and high data quality.
  • Hands-on Big Data Experience: Involved in real-time data ingestion into Hadoop using Sqoop and Kafka, and managed Oozie job scheduling for daily imports, learning and optimizing big data processing workflows.
  • Data Validation & Transformation: Utilized Python and SQL to manipulate, validate, and clean data for business intelligence, ensuring data accuracy and preparing datasets for further analysis.
  • Data Modeling & Database Management: Designed E/R Diagrams, enforced referential integrity, and worked on DB2 database normalization/de-normalization for optimized data storage and retrieval.
  • Collaborative Development: Worked closely with cross-functional teams (business stakeholders, DBAs, and data analysts) to understand and translate business requirements into technical specifications, fostering strong communication and problem-solving skills.
  • Data Quality Assurance: Involved in extensive data validation using complex SQL queries, enhancing back-end testing processes and resolving data quality issues for high-quality reporting.
  • Mastered Modern Data Formats: Gained hands-on experience with JSON, Parquet, and other Hadoop file formats, optimizing the storage and retrieval of large datasets in big data systems.
  • Continuous Learning & Adaptability: Demonstrated a quick learning curve by adapting to new tools and technologies, including Erwin Model Mart for effective model management and big data processing frameworks such as Netezza and IBM InfoSphere DataStage.

Data Engineering Intern

Careator Technologies Pvt Ltd
11.2022 - 02.2023
  • Assisted in collecting, cleaning, and preprocessing data from multiple sources, improving data quality by 25% to ensure high-quality datasets for analysis.
  • Supported the development and maintenance of ETL pipelines to efficiently extract, transform, and load data into data warehouses, reducing ETL
  • Worked with both SQL and NoSQL databases, performing data extraction and optimization tasks that improved query performance by 15%.
  • Collaborated with cross-functional teams (Data Scientists, Analysts) to understand data requirements, delivering actionable insights that helped improve project outcomes
  • Contributed to the design and implementation of data storage models and architectures, leading to a 30% improvement in storage efficiency and query speed.
  • Monitored the performance of data pipelines and assisted in troubleshooting, maintaining 99.8% uptime and reducing pipeline failures by 15%.
  • Helped automate data workflows and integrate data from internal and external sources, reducing manual processing and streamlining data processes.
  • Gained hands-on experience with big data technologies (e.g., Hadoop, Spark) and cloud platforms (AWS, GCP), improving scalability and processing efficiency by 30%.

Education

Master of Science - M.S in COMPUTER AND INFORMATION SCIENCES

Sacred Heart University
Fairfield, CT
05-2025

Bachelor of Science - B.SC COMPUTER DATA SCIENCE AND DATA ANALYTICS

Loyola Academy
ALWAL,HYDERBAD
04-2023

Skills

  • Big Data Technologies (Hadoop, Kafka, Sqoop, Oozie): Expertise in large-scale data processing, real-time streaming, and data ingestion with Hadoop, Kafka, Sqoop, and Apache Oozie
  • Cloud Platforms (AWS, GCP, Azure): Hands-on experience with AWS, GCP, and Azure for scalable data storage, processing, and analytics in cloud environments
  • ETL Processes & Frameworks (Informatica, IBM InfoSphere DataStage): Skilled in designing, developing, and optimizing ETL pipelines using Informatica and IBM InfoSphere DataStage
  • Data Storage & Management (SQL, NoSQL, Oracle, DB2, Netezza): Proficient in managing and querying SQL and NoSQL databases (Oracle, DB2, Netezza) for data storage and processing
  • Data Validation & Quality Assurance: Strong experience in data validation and ensuring data quality through complex SQL queries and testing
  • Data Visualization & Reporting (Tableau): Proficient in using Tableau to create actionable insights and visualize complex datasets with interactive dashboards and reports
  • Programming & Scripting (Python, SQL): Expertise in Python for automation, data manipulation, and building data pipelines, alongside strong SQL skills for querying and data validation

Timeline

Data Engineer

Amigos Software Solutions
02.2023 - 06.2023

Data Engineering Intern

Careator Technologies Pvt Ltd
11.2022 - 02.2023

Master of Science - M.S in COMPUTER AND INFORMATION SCIENCES

Sacred Heart University

Bachelor of Science - B.SC COMPUTER DATA SCIENCE AND DATA ANALYTICS

Loyola Academy
Bala Jyothi Thumma