Summary

Overview

Work History

Education

Skills

Academic Projects

Timeline

Amrutha M

Princeton,NJ

Summary

Having 4 Years of professional IT experience. Expertise with Big data on AWS cloud services like S3, Auto Scaling, Glue, EMR, EC2, Lambda, Step Functions, Cloud Watch, Cloud Formation, Athena, Dynamo DB, and Red Shift. Strong experience in core Python, SQL, PL/SQL, and Restful web services. 1 year of JAVA experience. Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, Airflow, Snowflake, and Spark for data storage and analysis. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Python. Strong understanding of Data Warehouse modeling and Reporting and Analytics platforms like Snowflake. Developed robust ETL pipelines leveraging technologies like Apache Spark and Airflow, ensuring efficient data ingestion, transformation, and loading processes. Collaborate with the engineering team to design and develop an SQL stored procedure that automates the data collection and preprocessing process. Experience in creating dashboards with Power BI and Tableau to provide data-driven insights that form business decisions. Strong focus on teamwork and achieving team goals. Excellent verbal and written communication skills. Hands-on experience with code versioning, automation, and workflow orchestration tools such as Github, Ansible, SLURM, Airflow, and Terraform. Responsive expert experienced in monitoring database performance, troubleshooting issues, and optimizing database environments. Possesses strong analytical skills, excellent problem-solving abilities, and a deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.

Overview

years of professional experience

Work History

Data Engineer

AT&T

05.2022 - Current

Used Agile software development methodology in defining problems, gathering requirements, development iterations, business modeling, and communicating with technical team for development of the system
Build Self-service data pipelines using AWS Services like SNS, Step Function, Lambda, Glue, EMR, EC2, Athena, Sage Maker, Quick Sight, Redshift, etc
Moved large amounts of data from AWS S3 buckets to AWS Redshift using Glue and EMR
Analyzed large and critical datasets using EMR, Glue and Spark
Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipeline system
Implemented scalable solutions for data preprocessing, feature engineering, and model training, utilizing Python libraries such as pandas, NumPy, and sci-kit-learn
Developed ETL pipelines in and out of data warehouses using tools like Python and AWS Glue
Implemented Spark using Python and Spark SQL for faster testing and processing of data
Consumed the data from Kafka using Apache spark
Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform
Leveraged cloud computing platforms such as AWS or Google Cloud for scalable infrastructure and storage solutions, enabling high-performance data processing and model training.

Data Engineer

GP Technologies

05.2019 - 08.2021

Create and maintain reporting infrastructure to facilitate visual representation of manufacturing data for purposes of operations planning and execution
Extract, Transform and Load data from source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and Azure Data Lake Analytics
Implemented Restful web service to interact with Redis Cache framework
Intake happens through Sqoop and Ingestion happens through Map Reduce, HBASE
Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data
Responsible for performing Machine-learning techniques regression/classification to predict the outcomes
GCP Cost Reduction Project, Designed Architecture for common could composer across the Projects
Constructed product-usage SDK data and data aggregations by using PySpark, Scala
Spark SQL and Hive context in partitioned Hive external tables maintained in AWS S3 location for reporting, data science dashboarding, and ad-hoc analyses
Involved in data processing using an ETL pipeline orchestrated by AWS Data Pipeline using Hive
Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc
Experience in creating configuration files to deploy the SSIS packages across all environments
Experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging
Provided virtual clusters under AWS cloud which includes servers like Redshift, GLUE and EC3
Implemented CI/CD pipelines using Jenkins and built and deployed the applications
Worked on developing Restful endpoints to cache application specific data in in-memory data clusters like Redis and exposed them with Restful endpoints
Creating Databricks notebooks using SQL, Python and automated notebooks using jobs
Interacting with other data scientists and architected custom solutions for data visualization using tools like Tableau, packages in R
Developed predictive models using Python & R to predict customers churn and classification of customers
Documenting the best practices and target approach for CI/CD pipeline
Coordinated with QA team in preparing for compatibility testing of Guidewire solution
Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing
Designed and implemented by configuring Topics in the new Kafka cluster in all environments.

Education

Master’s degree - computer science

Rivier University

Nashua, NH

Bachelor’s degree - computer science

Sathyabama University

Chennai, India

Skills

ETL development
Data Warehousing
Data Modeling
Data Pipeline Design
Data Migration
Big Data Processing
Scripting Languages
Spark Framework
SQL Expertise
Machine Learning
Data Governance
Real-time Analytics
NoSQL Databases
Data Security

API Development
Data Quality Assurance
Hadoop Ecosystem
SQL and Databases
Metadata Management
SQL Programming
Data Analysis
RDBMS
Data Analytics
Data Mining
Secure Data Retention
Data repositories
Security Protocols

Academic Projects

Visualization and Analysis of Airbnb Listings in NYC and Austin, Tableau, ETL, Excel, Python, Performed data cleansing for a total of more than 100,000 records with the ETL process using Microsoft Excel. Created dashboards, stories, metrics, and reports using Tableau, to derive insights for Airbnb listings. Analyzed data and performed regression using Python on factors affecting the price of an Airbnb with a result that the presence of washer/dryer in a listing affects 37% of its price.

Timeline

Data Engineer

AT&T

05.2022 - Current

Data Engineer

GP Technologies

05.2019 - 08.2021

Master’s degree - computer science

Rivier University

Bachelor’s degree - computer science

Sathyabama University

Amrutha M

Summary

Overview

Work History

Data Engineer

Data Engineer

Education

Master’s degree - computer science

Bachelor’s degree - computer science

Skills

Academic Projects

Timeline

Data Engineer

Data Engineer

Master’s degree - computer science

Bachelor’s degree - computer science

Similar Profiles

ANDRY NUNEZANDRY NUNEZ

Elda BenitezElda Benitez

Nolan BobingerNolan Bobinger

Andy ShirolaAndy Shirola

James KennedyJames Kennedy