Summary
Overview
Work History
Education
Skills
Timeline
Generic

Venkatesh Chirumamilla

Naperville,IL

Summary

Experienced with 8 years in data analytics and software development. Skilled in Hadoop, Spark, data modelling, and Agile. Expertise in solving complex problems with statistical and Machine learning techniques. Proficient in architecting, designing, and managing AWS and GCP cloud platforms, ensuring scalability and reliability

Overview

8
8
years of professional experience

Work History

Java Spark Developer

United Health Group, Optum
Remote, IL
10.2022 - Current

· Designed and implemented a data integration pipeline architecture, achieving 30% improvement in data extraction, transformation, and loading efficiency

· Documented comprehensive processes for Tableau Desktop usage, Tableau Server installation, and business requirements evaluation, resulting in a 30% reduction in onboarding time for new analysts.

· Extracted, transformed, and loaded data to generate CSV files using Python and SQL improving data processing efficiency by 40%

· Engineered data ingestion processes for the Global Data Lake on Google Cloud, achieving 40% automation using GDP.

· Leveraged expertise in Big Query, Cloud Storage, and Dataric to architect scalable data solutions, reducing data processing time by 50% through optimized pipelines with Airflow and Kafka

· Design and Implement Data Warehousing Solutions Architect, develop, and maintain data warehouses using Google BigQuery to support business analytics and reporting.

· Developed custom tools using Java for monitoring, logging, and analyzing Big Data workflows, enabling improved debugging and system performance analysis.

· Designed and implemented cloud-based data pipelines and architectures, leveraging Google Cloud services like Dataflow, Bigtable, Dataform, and Data Fusion.

· Developed and optimized data pipelines using Databricks, ensuring seamless integration with various data sources and improving data accessibility for analytics.

· Utilized scala for building and optimizing data processing algorithms for large-scale datasets terabytes in Hadoop and Spark environments

· Designed and developed interactive dashboards and reports using Qlik Sense, providing data-driven insights to key business stakeholders

· Integrated Apache Flink with various data sources such as Kafka, HDFS, and relational databases to streamline data ingestion and processing.

· Integrated RESTful APIs with third-party systems to enable seamless data exchange and interoperability.

· Created SSIS packages for data migration, data cleansing, and data integration tasks.

· Built robust applications using SQL, Python, and Spark, effectively managing various data formats including SCD, which increased data processing speed by 35%.

· Experience with Scala frameworks and libraries such as Akka for building concurrent and distributed applications, and Play for developing web applications.

· Integrated Java-based applications with Big Data ecosystems, including Hadoop, Spark, Kafka, and HBase.

· Designed and developed ETL processes to transform and load structured and unstructured data into Hadoop, ensuring data quality and consistency.

· Architected and managed large-scale NoSQL databases on Google Cloud Bigtable, ensuring low latency and high throughput.

· Maintained and organized repositories using GitHub for version control, ensuring code consistency and traceability across the development lifecycle.

· Designed and implemented backup and recovery strategies for Unix-based systems, ensuring data integrity and availability in case of system failures.

· Improved test coverage and reliability by suggesting and reviewing unit, integration, and functional tests during the code review process.

· Developed and orchestrated data integration workflows using Google Cloud Data Fusion for seamless data movement across environments.

· Designed and implemented scalable, fault-tolerant distributed systems to support high availability and reliability of services.

· Developed and designed complex data integration processes using Oracle Data Integrator (ODI) to extract, transform, and load (ETL) data from various sources into the target systems.

· Automated workflow management and monitoring using Airflow, achieving 60% reduction in manual intervention through task dependencies and SLA monitoring.

· Created RESTful APIs with Flask-RESTful, enhancing frontend-backend communication efficiency by 30%

· Developed frameworks and libraries for migrating data from Azureand legacy databases to Google Cloud Storage, reducing migration time by 50%.Developed and implemented Spark applications using Python and Scala.

  • Designed and developed data pipelines to ingest structured and unstructured data into HDFS using Apache Spark.
  • Optimized Spark jobs for improved performance, scalability, and reliability.
  • Tuned Spark configurations to achieve optimal execution times of batch processes.
  • Created Hive tables, optimized queries, stored procedures, functions and views on Hadoop clusters.
  • Performed data analysis tasks such as identifying trends in large datasets using Apache Spark MLlib library.
  • Collaborated with the team to design efficient ETL processes for loading data from various sources into the Hadoop ecosystem.
  • Deployed applications onto the cluster utilizing YARN resource manager to manage resources across multiple nodes of a cluster.

Software Engineer

Gap Inc
Hyderabad, TS
08.2018 - 12.2021

· Developed scalable reports and data solutions for Gap Supply Chain Processes, improving planning accuracy by 20%.

· Spearheaded the effort on standing up Planning Shared Services Platform to provide data to capabilities in o9 SaaS.

· Implemented a Python/PySpark framework on AWS Databricks clusters for config-driven Data Pipelines, reducing development and delivery time by 40%.

· Monitored and maintained APIs in production environments using tools like AWS CloudWatch and Datadog, proactively resolving issues to minimize downtime.

· Monitored compliance with Service Level Agreements (SLAs) within the BMC ticketing system, ensuring that resolution times met business requirements

· Integrated Kubernetes with cloud providers AWS for seamless infrastructure management and auto-scaling capabilities.

· Create, and manage ETL/ELT pipelines to ingest, transform, and load large datasets into BigQuery from various data sources.

· Implemented ETL workflows on Databricks using Delta Lake to ensure high-quality, scalable, and efficient data processing, reducing data latency by 60%

· Analyzed and optimized Scala applications for performance, employing profiling tools and best coding practices.

· Managed and automated batch job scheduling using Control-M, ensuring timely execution and monitoring of critical business processes.

· Integrated Docker into CI/CD pipelines using tools like Jenkins/GitLab CI, automating the build, test, and deployment processes, resulting in faster and more reliable software releases.

· Collaborated with data scientists and software engineers to design, train, and evaluate models using TensorFlow and other ML frameworks.

· Designed and implemented ETL processes using SSIS to extract, transform, and load data from various sources into data warehouses.

· Developed and maintained data processing pipelines using Hadoop ecosystem tools such as HDFS, MapReduce, Hive, Pig, and HBase to support large-scale data analytics

· Used Databricks SQL to write complex queries, create data visualizations, and provide insights for business stakeholders, improving decision-making by 70%.

· Integrated data from various sources including SQL Server, Excel, Azure, and other cloud-based data platforms into Power BI for comprehensive analysis.

· Integrate Kinesis with other AWS services and Big Data tools such as AWS Lambda, S3, Redshift, and EMR for end-to-end data processing solutions.

· Designed and implemented scalable data processing pipelines using Apache Beam with Google Cloud Dataflow.

· Automated ETL processes using ODI scheduling features and integrated with other enterprise scheduling tools to ensure timely data availability.

· Managed diverse data types including flat files, XML, YAML, Parquet, and Data Lake, achieving seamless data handling and reducing data errors by 25%.

· Developed data pipelines using Spark to store information into AWS S3 layers, Hive database and Snowflake.

· Designed and Developed data integration/engineering workflows on big data technologies sourced from S4 Hana.

· Experience of handling data in various file types; flat files, XML, Parquet, Delta Lake etc.

Systems Engineer

TechM
Hyderabad, TS
06.2016 - 08.2018

· Developed Python scripts for data cleaning and standardization, improving data quality by 30%.

· Loaded data into Hive tables from Data Lakes using Sqoop, enhancing data availability and accessibility by 25%

· Performed data transformations and analytics on large dataset using Spark.

· Optimized complex SQL queries, reducing query latency by 50% and enhancing overall system performance.

Education

Master of Science - Computer Science

Lewis University
Romeoville
05-2023

Bachelor of Science - Electronics And Communications

JNTU University
Guntur
05-2017

Skills

Programming Languages:Python, Java, Scala SQL, PySpark, C#, Hive QL, Shell

Frameworks/Tools: Spark, Hadoop, Kubeflow, Airflow, Kubernetes, Databricks, Pandas, Ab Initio, Scikit, Jenkins, CI/CD, Kafka

Cloud Services/Databases: AWS Services, Azure, GCP, Presto, Mongo DB, MySQL, PostgreSQL, Cassandra Terraform, SAS, Oracle, Spark Structured Streaming, MS SQL Server, My SQL, Teradata, NoSQL, Snowflake, Redshift, Big Query

Concepts: EDA, Statistics, Machine Learning & Deep Learning (Regression, Naïve Bayes, Decision Trees, Random Forests, GBTs, SVM, K-Means, Neural Networks), ETL, Data Modelling, Data Warehousing, Dashboarding, Scikit Learn, Data Visualization, Data validation

Timeline

Java Spark Developer

United Health Group, Optum
10.2022 - Current

Software Engineer

Gap Inc
08.2018 - 12.2021

Systems Engineer

TechM
06.2016 - 08.2018

Master of Science - Computer Science

Lewis University

Bachelor of Science - Electronics And Communications

JNTU University
Venkatesh Chirumamilla