Summary

Overview

Work History

Education

Skills

Accomplishments

Work Preference

Timeline

Chetan Aitaraju

Data Engineer

New York,NY

Summary

Around 5 years of extensive experience in Information Technology with expertise on Data Analytics, Data Design, Development, Implementation, Testing and Deployment of Software Applications in Finance, Insurance, domains. Working experience on designing and implementation complete end to end Hadoop infrastructure using HDFS, MapReduce, Hive, HBase, Kafka, Sqoop, Spark, No SQL, Postman, and Python Created Data Frames and performed analysis using Spark SQL. Acute knowledge on Spark Streaming and Spark Machine Learning Libraries. Experienced in writing the automatic scripts for monitoring the file systems, key MapR services. Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs. Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS Redshift, Lambda and Amazon EC2, Amazon EMR. Performed transformations on the imported data and exported back to RDBMS. Worked on Amazon Web service (AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake. Experience in writing queries in HQL (Hive Query Language), to perform data analysis. Created Hive External and Managed Tables. Implemented Partitioning and Bucketing on Hive tables for Hive Query Optimization. Integrated Flume with Kafka, using Flume both as a producer and consumer (concept of FLAFKA). Good Exposure to create various dashboard in Reporting Tools like SAS, Tableau, Power BI, BO, QlikView used various filters, sets while dealing with huge volume of data. Experience in various Database such as Oracle, Teradata, Informix and DB2. Experience with NoSQL like MongoDB, HBase and PostgreSQL like Greenplum

Overview

years of professional experience

Work History

Data Engineer

BNY

New York

09.2023 - Current

Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2
Handled AWS Management Tools as Cloud watch and Cloud Trail
Stored the log files in AWS S3
Used versioning in S3 buckets where the highly sensitive information is stored
Integrated AWS Dynamo DB using AWS lambda to store the values of items and backup the DynamoDB streams
Automated Regular AWS tasks like snapshots creation using Python scripts
Designed data warehouses on platforms such as AWS Redshift, Azure SQL Data Warehouse, and other high-performance platforms
Install and configure Apache Airflow for AWS S3 bucket and created dags to run the Airflow
Prepared scripts to automate the ingestion process using Pyspark and Scala as needed through various sources such as API, AWS S3, Teradata and Redshift
Created multiple scripts to automate ETL/ ELT process using Pyspark from multiple sources
Developed Pyspark scripts utilizing SQL and RDD in spark for data analysis and storing back into S3
Developed code in Spark SQL for implementing Business logic with python as programming language
Worked on Sequence files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Managed multiple Kubernetes clusters in a production environment
Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis
Data Integrity checks has been handled using hive queries, Hadoop, and Spark
Environment: AWS, JMeter, Kafka, Ansible, Jenkins, Docker, Linux, GIT, Cloud Watch, Python, Shell Scripting, Golang, Web Sphere, Splunk, Soap UI, Kubernetes, Terraform, PowerShell.

Data Engineer

Red Roof Inn

Albany, Ohio

01.2023 - 08.2023

Python, AWS (Glue, Lambda, S3, Redshift, RDS, Step Functions, Lake Formation, CloudWatch, CloudTrail, EC2, IAM, S3, RDS, Elastic Load Balancing, SQS), Kafka, Kubernetes, Jenkins, Golang, Terraform, PowerShell, JMeter
Designed and maintained robust data pipelines utilizing AWS Glue, Lambda, and S3, focusing on data integration, cleaning, and management of Redshift and RDS databases
Developed automated workflows using Step Functions and built scalable data lakes with Lake Formation and S3
Monitored pipeline health and performance using CloudWatch and CloudTrail, optimizing infrastructure with EC2 instances and managing security through IAM roles and security groups
Conducted comprehensive security audits, implemented backup strategies, and trained team members in AWS tool usage
Achieved a 30% reduction in data processing time by integrating new data sources and enhancing pipeline reliability through automated monitoring
Contributed to maintaining and expanding Core Data platform pipelines in Scala and Python/Spark, adhering to strict uptime SLAs
Expanded platform capabilities by enhancing metadata parsing, extending the Metastore API, and integrating new APIs
Implemented batch and streaming data pipelines using Scala, Databricks, and Airflow to support dynamic data ingestion
Led the adoption of a Lakehouse architecture, collaborating with stakeholders to transition towards a unified data platform
Developed shared libraries in Scala and Python to streamline business logic across all data pipelines within the organization.

Data Engineer

Tech Mahindra

Hyderabad

03.2019 - 01.2022

Implemented Spark using Python and Spark SQL for efficient data processing, specializing in large-scale transformations and denormalization of relational datasets
Involved in the complete implementation lifecycle, including writing custom MapReduce jobs, and optimizing Hive queries for querying and searching in HDFS
Managed and monitored Hadoop clusters using Cloudera Manager, ensuring continuous performance and stability
Facilitated data transfers between HDFS and Oracle databases bidirectionally using Sqoop
Installed and configured Cloudera Hadoop Distribution, optimizing environment setup for performance and scalability.

Education

Master of science - undefined

MS | Cumberland University

April 2024

Bachelor’s - computer Science

Pacific institute of Engineering Technology

May 2019

AWS certified Devops Engineer- Professional | Amazon Web Services (AWS) AWS certified Developer- Associate | Amazon Web Services (AWS) - undefined

Skills

Environment:
Hadoop, HDFS, Hive, MapReduce, Impala, Sqoop, SQL, Informatica, Python, Flume, Spark, Yarn, Pig, Oozie, Linux, AWS, Tableau, Maven, Jenkins, Autosys, Oracle, Sql Server, Sql, Teradata
TECHNICAL SKILLS:
Big Data Ecosystem:
HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Kafka Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR

Cloud Technologies: AWS, Azure, Google cloud platform (GCP)
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS SQL Server, HBASE
Programming: Query Languages Java, SQL, Python Programming (Pandas, NumPy, SciPy), NoSQL, PySpark, Scala
Data Engineer: Big Data Tools / Cloud / Visualization / Other Tools Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, Azure Databricks, Azure Data Explorer, Linux, PuTTY, Bash Shell, Unix, etc, Tableau, Power BI, SAS, We Intelligence, Crystal Reports, Dashboard Design

Accomplishments

Provided AWS support, leveraging EC2 for computing and S3 for storage, while deploying Lambda functions for automating EMR jobs in Data Lake environments
Scheduled Spark applications on AWS EMR clusters, utilizing event-driven AWS Lambda functions to trigger various resources
Led the migration strategy from SAP Data Warehouse to AWS Redshift, performing impact analyses and providing feedback on mapping changes
Developed project estimates and coordinated project progress to ensure timely delivery.

Work Preference

Work Type

Full TimeContract Work

Location Preference

On-SiteRemote

Timeline

Data Engineer

BNY

09.2023 - Current

Data Engineer

Red Roof Inn

01.2023 - 08.2023

Data Engineer

Tech Mahindra

03.2019 - 01.2022

Master of science - undefined

MS | Cumberland University

Bachelor’s - computer Science

Pacific institute of Engineering Technology