Summary
Overview
Work History
Education
Skills
Timeline
Generic

Kiran Kumar Y

Dallas,TX

Summary

4+ years of working experience as a Data Engineer (MZ Azure/Amazon Web Services- AWS and Python Developer with a solid understanding of evaluating Data Analysis and Data Engineering concepts utilizing Hadoop Framework, Spark Framework & Big Data components such as Hadoop- HDFS, MapReduce, Yarn, Pig, HIVE, Spark – Spark Streaming, Spark dataorapls3, Spark RDD, and Sqoop, Oozie, Python, Kafka, and Machine Learning Algorithms.

Overview

4
4
years of professional experience

Work History

Big Data Engineer

Infosys Limited
05.2022 - Current

Data Engineer/ Hadoop & Spark Developer

Project Description:

The project is to leverage the largest data set for products sold in the home space. Our team treats data as an asset and determines how to maximize its business value and extend our competitive advantage. My role is designing and delivering Data Warehouses, Data Lakes, Self-Service Tooling, Real-time Streaming, and Big Data Solutions for multiple functional areas using modern cloud technologies.

  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes a batch of data to Spark for real-time processing.
  • Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, and Kafka.
  • Created shell scripts to handle various jobs like Map Reduce, Hive, Pig, Spark, etc. based on the requirement.
  • Used Hive techniques like Bucketing and partitioning to create the tables.
  • Developing ETL pipelines in and out of data warehouses using a combination of Python and Snowflakes Snow SQL Writing SQL queries against Snowflake.
  • Worked on AWS to aggregate clean files in Amazon S3 and on Amazon EC2 Clusters to deploy files into Buckets.
  • Involved in Data Modeling using Star Schema, and Snowflake Schema.
  • Responsible for developing a data pipeline with Amazon AWS to extract the data from weblogs and store it in HDFS.
  • Migrated the data from AWS S3 to HDFS using Kafka.
  • Implementing Jenkins and building pipelines to drive all microservices builds out to the Docker registry and deploying them to Kubernetes.
  • Worked with NoSQL databases like HBase, and Cassandra to retrieve and load the data for real-time processing using Rest AP.
  • Responsible for transforming and loading large sets of structured, semi-structured, and unstructured data

Software Development Engineer

Amazon
06.2022 - 09.2022
  • Provided 24X7 (including weekend) support to address critical failures.
  • Worked with Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake
  • Extract, transform, and load data from source systems to Azure Storage services using combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL of Azure Data Lake Analytics.
  • Used Spark Streaming to divide streaming data into batches as input to Spark engine for batch processing.
  • Completed online data transfer from AWS S3 to Azure Blob by using Azure Data Factory (ADF)
  • Developed Spark Applications by using Scala, python, and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Involved in converting Hive/ SQL queries into Spark transformations using Spark RDD, Scala, and Python.
  • Perform validation and verify software at all testing phases which include Functional Testing, System Integration Testing, End to End Testing, Regression Testing, Sanity Testing, User Acceptance Testing, Smoke Testing, Disaster Recovery Testing, Production Acceptance Testing, and Pre-prod Testing phases.
  • Generating various capacity planning reports (graphical) using Python packages like NumPy, SciPy, Pandas, and Matplotlib.
  • Worked with Snowflake utilities, Snow SQL, Snow Pipe, and Big Data model techniques using Python

Big Data Engineer

NIC Infotek
06.2021 - 05.2022
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, and Sqoop.
  • Define and extract data from multiple sources, integrate disparate data into a common data model, and integrate data into a target database, application, or file using efficient programming processes.
  • Developed MapReduce applications using the Hadoop MapReduce programming framework for processing and used compression techniques to optimize MapReduce Jobs.
  • Worked on distributed computing architectures such as Hadoop, Kubernetes, and Docker containers.
  • Developed a data pipeline using Kafka and Storm to store data in HDFS.
  • Worked on AWS Data Pipeline to configure data loads from S3 to Redshift and have used AWS components (Amazon Web Services) - Downloading and uploading data files (with ETL) to AWS system using S3 components and Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Implemented Big Data Analytics and Advanced Data Science techniques to identify trends, patterns, and discrepancies in petabytes of data by using Hive, Hadoop, Python, HDFS, MapReduce, and Machine Learning.
  • Design and develop scalable, efficient data pipeline processes to handle data ingestion, cleansing, transformation, and integration using Sqoop, Hive, Python, and Impala.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks, and analysis.
  • Develop the Oozie actions like hive, shell, and java to submit and schedule applications to run in the Hadoop cluster.
  • Capture data integration metadata, lineage, and catalog through configuration and parameterization.
  • Implement a Continuous Delivery Pipeline with Docker and Git Hub.

Data Engineer

Clique Infotec
01.2019 - 01.2021
  • ·Created data pipelines in multiple instances to load the data from DynamoDB to store in the HDFS location.
  • Successfully executed Performance tuning of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Used Apache Flume to collect and aggregate large amounts of log data and staging data in HDFS.
  • Used Proc SQL, Proc Import, and SAS Data Step to clean, validate and manipulate data.
  • Collected Log data from web servers to integrate into HDFS location.
  • Wrote the MapReduce programs to handle semi-structured and unstructured data like JSON, Argo data files, and Sequence files for log data.
  • Developed Kafka producer and consumers for message handling.
  • Developed Oozie workflows to collect and manage for end-to-end processing.
  • Analyzed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, and Zookeeper.
  • Migrated HiveQL queries into Spark SQL for improved performance.
  • Involved in the Partitioning and Bucketing of the data stored in Hive Metadata.
  • Extensively involved in developing Restful API using the JSON library of Play framework.

.

Education

Master of Science - Computer And Information Sciences

Texas A&M University - Kingsville
Kingsville, TX
05.2022

Bachelor of Science - Electrical, Electronics And Communications Engineering

Marri Laxman Reddy Institute of Technology
Hyderabad India
05.2019

Skills

  • Hadoop, MapReduce, Spark, Data bricks, Pig, Hive, HBase, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Ambari, Elastic search, Parquet, Snappy, Airflow
  • Cloudera(CDH3, CDH4 AND CDH5), Hortonworks, Mapreduce, Apache, EMR
  • Azure SQL Database, Azure Data Lake (ADL), Azure Data Factory (ADF), Azure SQL Data Warehouse, Azure Service Bus, Azure Key Vault, Azure Analysis Service (AAS), Azure Blob Storage, Azure Search, Azure App Service
  • EMR, S3, EC2, VPC, Redshift, EMR, Lambda, Dynamo DB, RDS, SNS, SQS, Glue
  • Oracle, SQL Server, Cassandra, Teradata, PostgreSQL, HBase, MongoDB
  • Scala, SQL, PL/SQL, R, Python (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), Shell Scripting
  • Azure DevOps, Jenkins, Ant, Maven
  • Apache Tomcat, WebLogic, WebSphere
  • Linux, Windows, Ubuntu, Unix
  • Amazon Web Services (AWS), MS Azure

Timeline

Software Development Engineer

Amazon
06.2022 - 09.2022

Big Data Engineer

Infosys Limited
05.2022 - Current

Big Data Engineer

NIC Infotek
06.2021 - 05.2022

Data Engineer

Clique Infotec
01.2019 - 01.2021

Master of Science - Computer And Information Sciences

Texas A&M University - Kingsville

Bachelor of Science - Electrical, Electronics And Communications Engineering

Marri Laxman Reddy Institute of Technology
Kiran Kumar Y