Summary
Overview
Work History
Education
Skills
Academic Projects
Timeline
Generic

Amrutha M

Princeton,NJ

Summary

Having 4 Years of professional IT experience. Expertise with Big data on AWS cloud services like S3, Auto Scaling, Glue, EMR, EC2, Lambda, Step Functions, Cloud Watch, Cloud Formation, Athena, Dynamo DB, and Red Shift. Strong experience in core Python, SQL, PL/SQL, and Restful web services. 1 year of JAVA experience. Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, Airflow, Snowflake, and Spark for data storage and analysis. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Python. Strong understanding of Data Warehouse modeling and Reporting and Analytics platforms like Snowflake. Developed robust ETL pipelines leveraging technologies like Apache Spark and Airflow, ensuring efficient data ingestion, transformation, and loading processes. Collaborate with the engineering team to design and develop an SQL stored procedure that automates the data collection and preprocessing process. Experience in creating dashboards with Power BI and Tableau to provide data-driven insights that form business decisions. Strong focus on teamwork and achieving team goals. Excellent verbal and written communication skills. Hands-on experience with code versioning, automation, and workflow orchestration tools such as Github, Ansible, SLURM, Airflow, and Terraform. Responsive expert experienced in monitoring database performance, troubleshooting issues, and optimizing database environments. Possesses strong analytical skills, excellent problem-solving abilities, and a deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.

Overview

5
5
years of professional experience

Work History

Data Engineer

AT&T
05.2022 - Current
  • Used Agile software development methodology in defining problems, gathering requirements, development iterations, business modeling, and communicating with technical team for development of the system
  • Build Self-service data pipelines using AWS Services like SNS, Step Function, Lambda, Glue, EMR, EC2, Athena, Sage Maker, Quick Sight, Redshift, etc
  • Moved large amounts of data from AWS S3 buckets to AWS Redshift using Glue and EMR
  • Analyzed large and critical datasets using EMR, Glue and Spark
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipeline system
  • Implemented scalable solutions for data preprocessing, feature engineering, and model training, utilizing Python libraries such as pandas, NumPy, and sci-kit-learn
  • Developed ETL pipelines in and out of data warehouses using tools like Python and AWS Glue
  • Implemented Spark using Python and Spark SQL for faster testing and processing of data
  • Consumed the data from Kafka using Apache spark
  • Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform
  • Leveraged cloud computing platforms such as AWS or Google Cloud for scalable infrastructure and storage solutions, enabling high-performance data processing and model training.

Data Engineer

GP Technologies
05.2019 - 08.2021
  • Create and maintain reporting infrastructure to facilitate visual representation of manufacturing data for purposes of operations planning and execution
  • Extract, Transform and Load data from source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and Azure Data Lake Analytics
  • Implemented Restful web service to interact with Redis Cache framework
  • Intake happens through Sqoop and Ingestion happens through Map Reduce, HBASE
  • Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data
  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes
  • GCP Cost Reduction Project, Designed Architecture for common could composer across the Projects
  • Constructed product-usage SDK data and data aggregations by using PySpark, Scala
  • Spark SQL and Hive context in partitioned Hive external tables maintained in AWS S3 location for reporting, data science dashboarding, and ad-hoc analyses
  • Involved in data processing using an ETL pipeline orchestrated by AWS Data Pipeline using Hive
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc
  • Experience in creating configuration files to deploy the SSIS packages across all environments
  • Experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging
  • Provided virtual clusters under AWS cloud which includes servers like Redshift, GLUE and EC3
  • Implemented CI/CD pipelines using Jenkins and built and deployed the applications
  • Worked on developing Restful endpoints to cache application specific data in in-memory data clusters like Redis and exposed them with Restful endpoints
  • Creating Databricks notebooks using SQL, Python and automated notebooks using jobs
  • Interacting with other data scientists and architected custom solutions for data visualization using tools like Tableau, packages in R
  • Developed predictive models using Python & R to predict customers churn and classification of customers
  • Documenting the best practices and target approach for CI/CD pipeline
  • Coordinated with QA team in preparing for compatibility testing of Guidewire solution
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing
  • Designed and implemented by configuring Topics in the new Kafka cluster in all environments.

Education

Master’s degree - computer science

Rivier University
Nashua, NH

Bachelor’s degree - computer science

Sathyabama University
Chennai, India

Skills

  • ETL development
  • Data Warehousing
  • Data Modeling
  • Data Pipeline Design
  • Data Migration
  • Big Data Processing
  • Scripting Languages
  • Spark Framework
  • SQL Expertise
  • Machine Learning
  • Data Governance
  • Real-time Analytics
  • NoSQL Databases
  • Data Security
  • API Development
  • Data Quality Assurance
  • Hadoop Ecosystem
  • SQL and Databases
  • Metadata Management
  • SQL Programming
  • Data Analysis
  • RDBMS
  • Data Analytics
  • Data Mining
  • Secure Data Retention
  • Data repositories
  • Security Protocols

Academic Projects

Visualization and Analysis of Airbnb Listings in NYC and Austin, Tableau, ETL, Excel, Python, Performed data cleansing for a total of more than 100,000 records with the ETL process using Microsoft Excel. Created dashboards, stories, metrics, and reports using Tableau, to derive insights for Airbnb listings. Analyzed data and performed regression using Python on factors affecting the price of an Airbnb with a result that the presence of washer/dryer in a listing affects 37% of its price.

Timeline

Data Engineer

AT&T
05.2022 - Current

Data Engineer

GP Technologies
05.2019 - 08.2021

Master’s degree - computer science

Rivier University

Bachelor’s degree - computer science

Sathyabama University
Amrutha M