Summary
Overview
Work History
Education
Skills
Timeline
Generic

Indu Javvaji

Overland Park,USA

Summary

As a Data Engineer with 4+ years of experience in data engineering, business intelligence, and ETL development using AWS and Azure platforms. Proficient in Python and SQL, with strong expertise in distributed computing, data pipeline orchestration, and data warehouse management. Known for effective cross-functional collaboration and delivering data-driven insights in high-paced environments.

Overview

4
4
years of professional experience

Work History

Data Engineer

Good One IT Solutions
08.2023 - 09.2024
  • Exposure to Amazon OpenSearch and created Kibana dashboards for lambda log monitoring
  • Design and development of an automated data pipeline for MDM of customer data, integrating multiple data sources and applying data quality checks
  • Designed ETL processes using AWS Glue and Lambda to transfer data from landing zones to data lakes, meeting specific business criteria
  • Built and automated Snowflake processes for daily data loading, enabling real-time business intelligence insights
  • Worked closely with product managers, software developers, and BI engineers to gather data requirements and deliver data-driven insights for dashboards, reports, and recommendation engines​
  • Applied dynamic data masking in Snowflake to protect PII, maintaining compliance with data privacy standards
  • Skilled in Python and SQL for data processing and analysis, with experience in developing Spark scripts for data transformation and business logic automation
  • Developed and implemented data pipelines with AWS (S3, SNS, SQS, Glue, Lambda) for big data processing, supporting operational and analytical needs
  • Managed source code with Git and Bitbucket; involved in agile sprints to support product development
  • Developed and implemented a scalable end-to-end data pipeline for MDM of customer data using PySpark, EMR, SQL, and AWS technologies, resulting in a 50% reduction in data processing time and a 30% increase in data accuracy
  • Implemented data security and compliance measures using IAM roles, dynamic data masking, and encryption to protect sensitive information and PII in alignment with industry standards
  • Worked with non-relational databases like MongoDB for efficient storage and retrieval of semi-structured data, as well as real-time processing using Kafka and Kinesis
  • Developed stored procedures and tasks in Snowflake to automate the ingestion of data from S3 into Snowflake, ensuring timely and accurate data processing
  • Building scripting with python, Unix, Exposure to Jenkins
  • Developed Glue pipeline to load the data in different layers
  • Developed lambda to process SQS events to Mango DB
  • Involved in gathering requirements, designing, development and testing
  • Generated reports using Spark SQL for business requirements received on ADHOC basis
  • Exposure to orchestration tools Step function

Data Engineer

Trivium India Software Pvt Ltd.
06.2020 - 07.2022
  • Collaborated closely with cross-functional teams including product managers, analytics experts, and business stakeholders to meticulously assess and define data and analytics requirements
  • Developed scalable end-to-end data pipelines for customer MDM using PySpark, EMR, and AWS, reducing processing time by 50% and increasing data accuracy by 30%
  • Created automated ETL workflows with Airflow and AWS Glue, efficiently loading data from various sources into Snowflake
  • Configured Snowflake stored procedures and tasks for timely, accurate data ingestion from S3
  • Integrated data from multiple sources, leveraging data quality checks to ensure accuracy for business-critical insights
  • Developed and supported end-to-end data solutions including data lakes, ETL pipelines, and data warehouses using AWS services like S3, Glue, Lambda, Redshift, and Snowflake to meet internal and external reporting needs​
  • Generated real-time reports using Spark SQL for ad hoc business requests, enhancing responsiveness to BI needs
  • Designed and implemented a series of ETL processes using AWS Glue, Lambda to seamlessly transfer data from landing zones to the data lake, aligning with specific business needs and criteria
  • Leveraged the capabilities of Snowflake to construct efficient procedures for loading data into various Dimensional and Fact tables within the data lake, ensuring optimal organization and accessibility
  • Streamlined and automated Snowflake procedures through the development of Glue jobs, enabling the systematic and daily loading of data to support real-time business intelligence needs
  • Empowered end-users by providing access to the DataMart, enabling them to create insightful BI reports that contribute to a deeper understanding of critical business insights
  • Extracted, transformed, and loaded data from multiple sources using SQL and scripting languages like Python
  • Experienced in building automated ETL workflows with AWS Glue and Apache Airflow for seamless data integration​
  • Leveraged dynamic data masking features within Snowflake to selectively conceal sensitive fields or personally identifiable information (PII), minimizing the risk of unauthorized access
  • Experience on source code management with bitbucket, git repositories
  • Knowledge of agile methodologies for delivering software solutions
  • Developing and implementing data pipelines using AWS services such as S3, SQS, SNS, lambda, Crawler, Athena, EC2, Glue, to process big data
  • Actively participated in the creation of change requests and led deployments, orchestrating the smooth transition of code to higher environments to support ongoing development and optimization efforts

Education

Masters of Computer Information Systems and Information Technology -

University of Central Missouri
12.2023

Skills

  • Hadoop
  • Spark
  • HDFS
  • MapReduce
  • Yarn
  • Sqoop
  • Hive
  • Glue
  • Python
  • SQL
  • Unix Shell
  • AWS
  • Microsoft Azure
  • Informatica PowerCenter
  • IICS
  • Airflow
  • Druid
  • Presto
  • Flink
  • Unix shell scripting
  • Oracle
  • MS SQL server
  • DB2
  • Snowflake
  • MongoDB
  • Redshift
  • PostgreSQL
  • Putty
  • WinSCP
  • Kafka
  • AWS Kinesis
  • Tableau
  • Superset
  • Looker

Timeline

Data Engineer

Good One IT Solutions
08.2023 - 09.2024

Data Engineer

Trivium India Software Pvt Ltd.
06.2020 - 07.2022

Masters of Computer Information Systems and Information Technology -

University of Central Missouri
Indu Javvaji