Summary

Overview

Work History

Education

Skills

Accomplishments

Work Preference

Certification

Work Availability

Timeline

HARSHA VARDHAN KOSURU

Data Engineer

DeKalb,IL

Summary

I’m a highly skilled data engineer with over five years of experience in designing, developing, and optimizing data integration solutions in diverse environments. Adept at leveraging a wide range of technologies including Apache Hadoop, Spark, Kafka, AWS, and various relational and NoSQL databases to build robust ETL pipelines and real-time data processing systems. Demonstrated my expertise in Agile methodologies, data orchestration using tools like NiFi and Airflow, and containerization with Docker and Kubernetes. I have proven my ability to implement data governance frameworks ensuring data quality and compliance, while effectively utilizing advanced analytics tools such as Zeppelin and Jupyter Notebooks for deriving insights from large datasets. Extensive experience in cloud-based data warehousing solutions, cloud infrastructure management, and automating CI/CD processes. Also have strong background in developing scalable data architectures and maintaining enterprise data warehouses, coupled with hands-on experience in machine learning, data visualization, and performance tuning. I can proudly say I’m a collaborative team player with a track record of delivering high-quality data solutions that drive business value.

Overview

years of professional experience

Certifications

Work History

Intern as Data Engineer

Thrive Software Solutions

, WA

02.2024 - 05.2024

Utilized Apache Zeppelin and Jupyter Notebooks for advanced analytics, deriving insights from large datasets through statistical techniques and machine learning algorithms
Managed data orchestration and workflows efficiently with Apache NiFi and Luigi, handling various data formats including JSON, XML, Parquet, CSV, and ORC
Used Docker for containerization and Kubernetes for orchestration, facilitating the deployment and management of containerized applications
Implemented data governance frameworks with Apache Atlas and Collibra, ensuring data quality, privacy, and regulatory compliance
Leveraged Apache Kafka Streams and Amazon Kinesis for real-time data processing, optimizing streaming data pipelines for high-throughput and real-time analytics.

Graduate Research Assistant

Northern Illinois University

Dekalb, IL

01.2023 - 01.2024

Enhanced distributed data processing efficiency by leveraging Apache Hadoop, Spark, and Flink, focusing on in-memory and real-time stream processing
Implemented advanced resource management and scalable architectures using containerization and load balancing techniques
Developed optimized ETL processes with incremental loading and real-time data processing capabilities using Apache Kafka
Integrated automated monitoring and self-healing mechanisms into data pipelines, utilizing Apache Airflow for workflow orchestration
Ensured data quality and optimized performance by incorporating robust validation steps and profiling tools to identify and resolve bottlenecks.

AWS Data Engineer

Mindtree Ltd

Hyderabad

11.2020 - 07.2022

Developed cloud migration strategy and implemented best practices using AWS services like database migration service and server migration service
Setup and build AWS infrastructure using resources such as VPC, EC2, S3, DynamoDB, IAM, EBS, Route53, SNS, SES, SQS, CloudWatch, CloudTrail, Security Group, Auto Scaling, and RDS using CloudFormation templates
Implemented new tools like Kubernetes with Docker for auto-scaling and continuous integration (CI), deploying Docker images through Kubernetes, and using the Kubernetes dashboard for monitoring
Utilized AWS Lambda for serverless computing and trigger-based code execution
Worked on implementing data warehouse solutions in AWS Redshift and migrating data from various databases to AWS services
Developed scripts in BASH and Python for AWS infrastructure creation and automation tasks
Orchestrated and migrated CI/CD processes using CloudFormation, Terraform, and Docker, setup in OpenShift, AWS, and VPCs
Developed Python programs for automating tasks like extracting metadata and lineage from tools, saving significant manual effort
Utilized Spark for improving performance and optimizing existing algorithms in Hadoop environments
Integrated real-time monitoring for data ingestion processes using AWS CloudWatch
Configured Airflow connection to AWS EMR cluster and developed bash shell bootstrap scripts for initializing the cluster with necessary configurations
Defined, created, and deployed Star Schema, Snowflake Schema, and Dimensional Data Modeling on an Enterprise Data Warehouse (EDW).

Big Data Engineer

Arcesium

Hyderabad

07.2018 - 10.2020

Worked in Agile environments using tools like Rally to maintain user stories and tasks
Utilized Agile methodology and SCRUM process, providing daily reports and participating in design and development phases
Developed Spark/PySpark-based ETL pipelines for migrating credit card transactions, account, and customer data into an enterprise Hadoop Data Lake
Migrated MapReduce jobs to Spark for better performance and used Spark RDDs, Python, and Scala for data transformations
Maintained data integration programs in Hadoop and RDBMS environments from both structured and semi-structured data sources
Developed data pipelines using Spark, Hive, Pig, Python, Impala, and HBase
Utilized AWS services such as EMR, S3, Lambda, and SNS for data processing and storage
Designed and implemented database solutions in Azure SQL Data Warehouse and Azure SQL
Designed SSIS Packages for ETL from various environments into SQL Server for SSAS cubes
Transformed Teradata scripts and stored procedures to SQL and Python for Snowflake's cloud platform
Defined, created, and deployed Star Schema, Snowflake Schema, and Dimensional Data Modeling on an EDW
Implemented Composite server for data virtualization and created restricted data access views using a REST API
Batch processed data from S3 to MongoDB, PostgreSQL, and MySQL
Queried and analyzed data from Cassandra using CQL and joined various tables using Spark and Scala
Built and published customized interactive Tableau reports and dashboards
Created multiple dashboards in Tableau for various business needs and used SQL Server Reporting Services (SSRS) for formatted reports
Performed performance tuning on Hive queries and UDFs
Supervised data profiling and validation to ensure accuracy between source and target systems
Configured Topics in new Kafka clusters across environments and brought data into Hadoop and Cassandra using Kafka
Implemented Apache Drill on Hadoop to join data from SQL and NoSQL databases for storage.

Hadoop Developer

GENPACT

Hyderabad

01.2018 - 06.2018

Installed Oozie workflow engine to run multiple Hive and Pig Jobs
Developed Simple to complex Map/reduce Jobs using Hive and Pig
Developed Map Reduce Programs for data analysis and data cleaning
Implemented Avro and parquet data formats for Apache Hive computations to handle custom business requirements
Integrating external data sources and APIs into GCP data solutions, ensuring data quality and consistency
Building data transformation pipelines using GCP services like Dataflow and Apache Beam to cleanse, normalize, and enrich data
Build machine-learning models to showcase big data capabilities using PySpark
Designed, implemented, and deployed within a customer’s existing Hadoop / Cassandra cluster a series of custom parallel algorithms for various customer-defined metrics and unsupervised learning models
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc
Performed data cleansing, enrichment, mapping tasks and automated data validation processes to ensure meaningful and accurate data was reported efficiently
Implemented Apache PIG scripts to load data from and to store data into Hive.

Education

Master of Science - Management Information Systems

Northern Illinois University (NIU)

May 2024

Bachelor of Technology - undefined

Amity University

May 2018

Skills

TECHNICAL SKILLS
AWS services : Amazon EMR ,Amazon S3, AWS Lambda, Amazon SNS, Amazon SQS,AWS CloudWatch
Big Data Technologies : MapReduce, Hive, Sqoop, Oozie, Zookeeper, Apache spark, PySpark, YARN, Hadoop, Apache Spark
Apache HBase, Apache Kafka, Databricks Delta lake
Data Storage and Warehousing : Snowflake, Data Warehouse, DB2, Cassandra, HDFS, Hadoop (Hortonworks), Terraform
Operating systems : Windows, LINUX
ETL process and tools : Apache Airflow

Data Formats and Protocols : JSON, XML, Parquet, CSV, ORC
Messaging Services : Apache Kafka, Azure Service Bus
Databases : MS SQL Server, Azure SQL Database, Oracle, Snowflake, RDBMS, MS Excel, MS Access
Azure Cosmos DB
Containerization & orchestration: Docker, Kubernetes, Apache NiFi/Luigi
Reporting Tool : Tableau, Power BI
Miscellaneous : REST API, Scrum, Agile methodology, waterfall methodology, Project Management

Accomplishments

AWS Cloud Practitioner
Google Agile Project Management
Public
Public

Work Preference

Work Type

Full TimeContract Work

Location Preference

On-SiteRemoteHybrid

Important To Me

Company CultureCareer advancementTeam Building / Company Retreats

Certification

AWS Cloud Practioner, AWS - 01/09/2024-09/01/2027
Certified Project Management, Google

Work Availability

monday

tuesday

wednesday

thursday

friday

saturday

sunday

morning

afternoon

evening

swipe to browse

Timeline

Intern as Data Engineer

Thrive Software Solutions

02.2024 - 05.2024

Graduate Research Assistant

Northern Illinois University

01.2023 - 01.2024

AWS Data Engineer

Mindtree Ltd

11.2020 - 07.2022

Big Data Engineer

Arcesium

07.2018 - 10.2020

Hadoop Developer

GENPACT

01.2018 - 06.2018

Master of Science - Management Information Systems

Northern Illinois University (NIU)

Bachelor of Technology - undefined

Amity University

HARSHA VARDHAN KOSURU

Summary

Overview

Work History

Intern as Data Engineer

Graduate Research Assistant

AWS Data Engineer

Big Data Engineer

Hadoop Developer

Education

Master of Science - Management Information Systems

Bachelor of Technology - undefined

Skills

Accomplishments

Work Preference

Work Type

Location Preference

Important To Me

Certification

Work Availability

Timeline

Intern as Data Engineer

Graduate Research Assistant

AWS Data Engineer

Big Data Engineer

Hadoop Developer

Master of Science - Management Information Systems

Bachelor of Technology - undefined

Similar Profiles

Bhargav KirneBhargav Kirne

Pamodini PereraPamodini Perera

Michelle LinaresMichelle Linares

Aminah ShaversAminah Shavers

Wendy ZhangWendy Zhang