Summary
Overview
Work History
Education
Skills
Accomplishments
Work Preference
Timeline
Generic

Chetan Aitaraju

Data Engineer
New York,NY

Summary

Around 5 years of extensive experience in Information Technology with expertise on Data Analytics, Data Design, Development, Implementation, Testing and Deployment of Software Applications in Finance, Insurance, domains. Working experience on designing and implementation complete end to end Hadoop infrastructure using HDFS, MapReduce, Hive, HBase, Kafka, Sqoop, Spark, No SQL, Postman, and Python Created Data Frames and performed analysis using Spark SQL. Acute knowledge on Spark Streaming and Spark Machine Learning Libraries. Experienced in writing the automatic scripts for monitoring the file systems, key MapR services. Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs. Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS Redshift, Lambda and Amazon EC2, Amazon EMR. Performed transformations on the imported data and exported back to RDBMS. Worked on Amazon Web service (AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake. Experience in writing queries in HQL (Hive Query Language), to perform data analysis. Created Hive External and Managed Tables. Implemented Partitioning and Bucketing on Hive tables for Hive Query Optimization. Integrated Flume with Kafka, using Flume both as a producer and consumer (concept of FLAFKA). Good Exposure to create various dashboard in Reporting Tools like SAS, Tableau, Power BI, BO, QlikView used various filters, sets while dealing with huge volume of data. Experience in various Database such as Oracle, Teradata, Informix and DB2. Experience with NoSQL like MongoDB, HBase and PostgreSQL like Greenplum

Overview

6
6
years of professional experience

Work History

Data Engineer

BNY
New York
09.2023 - Current
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2
  • Handled AWS Management Tools as Cloud watch and Cloud Trail
  • Stored the log files in AWS S3
  • Used versioning in S3 buckets where the highly sensitive information is stored
  • Integrated AWS Dynamo DB using AWS lambda to store the values of items and backup the DynamoDB streams
  • Automated Regular AWS tasks like snapshots creation using Python scripts
  • Designed data warehouses on platforms such as AWS Redshift, Azure SQL Data Warehouse, and other high-performance platforms
  • Install and configure Apache Airflow for AWS S3 bucket and created dags to run the Airflow
  • Prepared scripts to automate the ingestion process using Pyspark and Scala as needed through various sources such as API, AWS S3, Teradata and Redshift
  • Created multiple scripts to automate ETL/ ELT process using Pyspark from multiple sources
  • Developed Pyspark scripts utilizing SQL and RDD in spark for data analysis and storing back into S3
  • Developed code in Spark SQL for implementing Business logic with python as programming language
  • Worked on Sequence files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
  • Managed multiple Kubernetes clusters in a production environment
  • Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis
  • Data Integrity checks has been handled using hive queries, Hadoop, and Spark
  • Environment: AWS, JMeter, Kafka, Ansible, Jenkins, Docker, Linux, GIT, Cloud Watch, Python, Shell Scripting, Golang, Web Sphere, Splunk, Soap UI, Kubernetes, Terraform, PowerShell.

Data Engineer

Red Roof Inn
Albany, Ohio
01.2023 - 08.2023
  • Python, AWS (Glue, Lambda, S3, Redshift, RDS, Step Functions, Lake Formation, CloudWatch, CloudTrail, EC2, IAM, S3, RDS, Elastic Load Balancing, SQS), Kafka, Kubernetes, Jenkins, Golang, Terraform, PowerShell, JMeter
  • Designed and maintained robust data pipelines utilizing AWS Glue, Lambda, and S3, focusing on data integration, cleaning, and management of Redshift and RDS databases
  • Developed automated workflows using Step Functions and built scalable data lakes with Lake Formation and S3
  • Monitored pipeline health and performance using CloudWatch and CloudTrail, optimizing infrastructure with EC2 instances and managing security through IAM roles and security groups
  • Conducted comprehensive security audits, implemented backup strategies, and trained team members in AWS tool usage
  • Achieved a 30% reduction in data processing time by integrating new data sources and enhancing pipeline reliability through automated monitoring
  • Contributed to maintaining and expanding Core Data platform pipelines in Scala and Python/Spark, adhering to strict uptime SLAs
  • Expanded platform capabilities by enhancing metadata parsing, extending the Metastore API, and integrating new APIs
  • Implemented batch and streaming data pipelines using Scala, Databricks, and Airflow to support dynamic data ingestion
  • Led the adoption of a Lakehouse architecture, collaborating with stakeholders to transition towards a unified data platform
  • Developed shared libraries in Scala and Python to streamline business logic across all data pipelines within the organization.

Data Engineer

Tech Mahindra
Hyderabad
03.2019 - 01.2022
  • Implemented Spark using Python and Spark SQL for efficient data processing, specializing in large-scale transformations and denormalization of relational datasets
  • Involved in the complete implementation lifecycle, including writing custom MapReduce jobs, and optimizing Hive queries for querying and searching in HDFS
  • Managed and monitored Hadoop clusters using Cloudera Manager, ensuring continuous performance and stability
  • Facilitated data transfers between HDFS and Oracle databases bidirectionally using Sqoop
  • Installed and configured Cloudera Hadoop Distribution, optimizing environment setup for performance and scalability.

Education

Master of science - undefined

MS | Cumberland University
April 2024

Bachelor’s - computer Science

Pacific institute of Engineering Technology
May 2019

AWS certified Devops Engineer- Professional | Amazon Web Services (AWS) AWS certified Developer- Associate | Amazon Web Services (AWS) - undefined

Skills

  • Environment:
  • Hadoop, HDFS, Hive, MapReduce, Impala, Sqoop, SQL, Informatica, Python, Flume, Spark, Yarn, Pig, Oozie, Linux, AWS, Tableau, Maven, Jenkins, Autosys, Oracle, Sql Server, Sql, Teradata
  • TECHNICAL SKILLS:
  • Big Data Ecosystem:
  • HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Kafka Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR
  • Cloud Technologies: AWS, Azure, Google cloud platform (GCP)
  • Databases: Oracle 11g/10g/9i, MySQL, DB2, MS SQL Server, HBASE
  • Programming: Query Languages Java, SQL, Python Programming (Pandas, NumPy, SciPy), NoSQL, PySpark, Scala
  • Data Engineer: Big Data Tools / Cloud / Visualization / Other Tools Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, Azure Databricks, Azure Data Explorer, Linux, PuTTY, Bash Shell, Unix, etc, Tableau, Power BI, SAS, We Intelligence, Crystal Reports, Dashboard Design

Accomplishments

  • Provided AWS support, leveraging EC2 for computing and S3 for storage, while deploying Lambda functions for automating EMR jobs in Data Lake environments
  • Scheduled Spark applications on AWS EMR clusters, utilizing event-driven AWS Lambda functions to trigger various resources
  • Led the migration strategy from SAP Data Warehouse to AWS Redshift, performing impact analyses and providing feedback on mapping changes
  • Developed project estimates and coordinated project progress to ensure timely delivery.

Work Preference

Work Type

Full TimeContract Work

Work Location

On-SiteRemote

Timeline

Data Engineer

BNY
09.2023 - Current

Data Engineer

Red Roof Inn
01.2023 - 08.2023

Data Engineer

Tech Mahindra
03.2019 - 01.2022

Master of science - undefined

MS | Cumberland University

Bachelor’s - computer Science

Pacific institute of Engineering Technology

AWS certified Devops Engineer- Professional | Amazon Web Services (AWS) AWS certified Developer- Associate | Amazon Web Services (AWS) - undefined

Chetan AitarajuData Engineer