Summary

Overview

Work History

Education

Skills

Ai Ml Exposure

Timeline

Nirmalharsha Kolluru

Data & Analytics - Senior Software Developer

Summary

Senior Software Engineer with expertise in data engineering and generative AI. Experienced with LangChain for LLM applications and PyTorch for machine learning. Passionate about applying AI to create innovative solutions.

Overview

years of professional experience

Work History

Senior Software Developer

Savvas Learning Company

05.2022 - Current

Data Engineering & ETL :

Developed and optimized ETL pipelines, integrating data from diverse sources into data warehouses.
Proficient in data mapping, requirement analysis, and data transformation.
Experience with Pentaho and Talend for ETL workflow development and maintenance.
Implemented and automated AWS Lambda functions for efficient data processing.
Utilized Amazon S3 Event Notifications with SNS for real-time object-level event handling.
Configured and managed Stitch replication for data extraction and loading.
Implemented monitoring solutions using Datadog for ETL performance tracking and optimization.

GEN-AI & Machine Learning :

Experience leveraging LangChain for building applications powered by large language models (LLMs).
Proficient in utilizing PyTorch for developing and deploying machine learning models.
Experience in utilizing LLMs for data transformation and data anomally detection.

Team Collaboration & Management :

Collaborated with and managed offshore teams, ensuring effective communication and project coordination.

Lead Applications Developer

Marsh McLennan

10.2020 - 05.2022

Designed and implemented robust, scalable data ingestion pipelines on AWS, utilizing Lambda, CloudWatch, and Step Functions to automate data flow into the data lake.
Developed and deployed AWS Lambda functions incorporating HQL queries, enhancing data processing efficiency and leveraging AWS SNS for real-time notification services.
Engineered high-performance ETL jobs using Scala to process diverse data sources (pipe-delimited CSV, Cobol, fixed-width text), implementing rigorous data validation and format checks before ingestion into Hive tables.
Developed Scala maps for data aggregation and filtering, optimizing data delivery for Tableau dashboards and enabling insightful business visualizations.
Built a comprehensive data validation framework to enforce critical business rules, ensuring data accuracy and integrity within the data lake.
Automated data aggregation for analytics consumption by developing Lambda functions to trigger Hive and Spark jobs on EMR clusters, transforming processed data into actionable insights.
Enhanced Spark job performance through meticulous configuration tuning, significantly improving data processing speeds and resource utilization.
Orchestrated time-based data ingestion triggers using AWS CloudWatch, ensuring timely and reliable data availability for downstream analytics.
Contributed to the development of predictive analytics models for key business initiatives, including COVID-19 data analysis, fraud prediction, and risk management savings metrics.
Collaborated effectively with data owners, Business Units, the Data Integration team, and customers in a fast-paced Agile/Scrum environment, ensuring clear communication and timely project delivery.
Utilized IntelliJ IDE and Git version control with Bitbucket for efficient development, version control, and team collaboration.

Apache Spark Engineer

Transamerica Corporation

10.2018 - 10.2020

Developed batch data processing solutions using Spark Scala and Spark SQL.
Created Hive tables, loaded data, and wrote HQL queries.
Resolved Hive performance issues using partitioning, bucketing, and indexing.
Executed Hive queries in Hue and CLI.
Contributed to schema mapping and external column generation.
Implemented on-premises Change Data Capture (CDC) processes.
Used Python, U-SQL, and PowerShell to process data in Azure Data Lake Storage (ADLS).
Built and automated daily data pipelines using Azure Runbooks.

Sr Bigdata Engineer

ODH

03.2018 - 10.2018

Developed Spark Scala applications for large-scale data processing and transformation.
Engineered data ingestion pipelines from diverse sources (DB2, PostgreSQL, S3) using Spark JDBC and CSV.
Implemented complex data transformations and aggregations using Spark SQL DataFrames.
Utilized AWS services (SNS, SQS, EMR, Lambda, Glue) for data processing and deployment.
Designed and optimized data pipelines using NiFi and Kafka for real-time and batch data processing.
Implemented data validation, CDC processes, and data format conversions (CSV to JSONB/HL7).
Automated deployments using Terraform, Jenkins, and Subversion.
Managed containerization using Docker and version control with Bitbucket.

Data Engineer

OneMarket

03.2017 - 03.2018

Engineered data transformation pipelines using StreamSets for diverse data sources.
Utilized Google Cloud Storage and Amazon S3 for data storage from Pub/Sub and vendor feeds.
Developed and executed queries on BigQuery, importing data in Avro, CSV, and JSON formats.
Performed data analysis using Databricks with Spark SQL, SQL queries, and Scala.
Implemented streaming data pipelines using StreamSets, capturing data from Pub/Sub and storing it in Google Cloud.
Designed and implemented multi-tenancy dashboards and visualizations in Looker.
Managed project tasks and followed Agile/Scrum methodologies using Jira.

Hadoop Developer

Dell International Services

06.2013 - 06.2014

Developed and optimized MapReduce jobs for efficient large-scale data processing.
Engineered Pig scripts to perform complex data transformations, including joins, filtering, and aggregations.
Implemented data ingestion solutions using Sqoop for loading data into HDFS, including scheduled incremental loads.
Designed and optimized Hive tables with partitioning and bucketing, and wrote HiveQL queries for data analysis and reporting.
Implemented real-time data streaming using Kafka and developed data pipelines using Flume for weblog data ingestion.
Automated data workflows and job scheduling using Oozie and shell scripting to ensure reliable data processing.
Contributed to data optimization by tuning Hive queries and performing data scrubbing.
Participated in Agile/Scrum development methodologies, contributing to all phases of the development lifecycle.