Summary

Overview

Work History

Education

Skills

Timeline

Aishwarya M

Summary

Around 9 years of IT experience with Amazon Web Services (Amazon EC2, Amazon S3 or Lambda, AWS cloud watch, Amazon elastic load balancer, Amazon Simple DB, Amazon RDS, Elastic Search, Amazon MQ, Amazon Lambdas, Amazon SQS, AWS Identity and access management, Amazon EBS and Amazon CloudFormation). Experience in working with AWS Code Pipeline to deploy Docker containers in AWS ECS using services like CloudFormation, Code Build, Code Deploy. Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS). Experienced in Automating, Configuring, and deploying Instances on AWS, Azure environments and Data centers and managing security groups on AWS. Good knowledge in Technologies on systems which comprises of massive amount of data running in highlydistributive mode in Cloudera, Hortonworks Hadoop distributions and Amazon AWS. Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure. Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory. Experience in dealing with Windows Azure IaaS - Virtual Networks, Virtual Machines, Cloud Services, Resource Groups, Express Route, Traffic Manager, VPN, Load Balancing, Application Gateways, Autoscaling. Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure. Experience in building power bi reports on Azure Analysis services for better performance when comparing that to direct query using GCP Big Query. Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributedstream-processing). Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Flume, MapReduce framework, Yarn, Scala, and Hue. Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating accordingto client's requirement. Ingested data into Snowflake cloud data warehouse using Snow pipe. Extensive experience in working with micro batching to ingest millions of files on Snowflake cloud whenfiles arrive to staging area. Results-driven individual with a solid track record in delivering quality work. Known for excellent communication and teamwork abilities, with a commitment to achieving company goals and delivering exceptional service. Passionate about continuous learning and professional development.

Overview

years of professional experience

Work History

Senior Big Data Engineering

Optum

Minneapolis, MN

02.2023 - Current

Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, Redshift, Lambda and Glue)
Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
Pull data from SQL server, Teradata, Amazon S3 bucket & internal SFTP and dump them into data warehouse AWS S3 bucket
Used AWS glue ETL and data brew service that consumes raw data from S3 bucket and transforms raw data as per the requirement and write the output to s3 bucket in parquet format for data analytics purpose
Leading the testing efforts in support of projects/programs across a large landscape of technologies (Unix, Angular JS, AWS, Sause LABS, Cucumber JVM, Mongo DB, GitHub, Bitbucket, SQL, NoSQL database
Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python Matplotlib
Extensive expertise using the core Spark APIs and processing data on a EMR cluster Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server
Stored data in AWS S3 like HDFS and performed EMR (Elastic MapReduce) jobs on data stored in S3
Experience in AWS Glue for fully managed ETL service that makes it easy to move data between data stores and Manage ETL jobs that automate data processing and reporting
Created numerous ODI interfaces and load into Snowflake DB
Worked on Amazon Redshift for shifting all Data warehouses into one Data warehouse
Used Spark Scala to import customer information data from Oracle database into HDFS for data processing along with minor cleansing
Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, Pyspark
Learner data model which gets the data from Kafka in real time and persist it to Cassandra
Developed Kafka consumer API in python for consuming data from Kafka topics
Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS
Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe- line system
Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS
Worked on reading and writing multiple data formats like JSON, ORC, parquet on HDFS using Pyspark
Managed Zookeeper for cluster co-ordination and Kafka Offset monitoring
Using Apache Airflow managed, structured, and organized ETL pipelines using Directed Acyclic Graphs(DAGs)
Created reports for BI team using SQOOP to export data into HDFS and Hive
Launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster
Hands on experience in loading data from UNIX file system to HDFS
Experienced on loading and transforming of large sets of structured, semi structured, and unstructureddata from HBase through Sqoop and placed in HDFS for further processing
Worked on Cloudera to analyze data present on top of HDFS
Worked extensively on Hive and PIG
Automated the data processing with Oozie to automate data loading into the Hadoop Distributed FileSystem
Scheduling Batch jobs through AWS Batch and performing Data processing jobs by leveraging Apache Spark APIs
Extensively used the advanced features of PL/SQL like Records, Tables, Object types and Dynamic SQL
Analyzed data to identify, investigate and report trends linked to fraudulent transactions and claims
Created and worked on SQOOP jobs with incremental load to populate Hive External tables

Azure Data Engineer

Cognizant Technologies

08.2020 - 11.2021

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks
Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data
Understand current Production state of application and determine the impact of new implementation on existing business processes
Deployed the initial Azure components like Azure Virtual Networks, Azure Application Gateway, Azure Storage and Affinity groups
Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
Experience in moving data between GCP and Azure using Azure Data Factory
Created data bricks notebooks using Python (PySpark), Scala and Spark SQL for transforming the data that is stored in Azure Data Lake stored Gen2 from Raw to Stage and Curated zones
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
Writing Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS
Experience on end-to-end implementation of Snowflake cloud data warehouse
Expertise in Snowflake - data modelling, ELT using Snowflake SQL, implementing complex stored Procedures and standard DWH and ETL concepts
Design and implement multiple ETL solutions with various data sources by extensive SQL Scripting, ETL tools, Python, Shell Scripting, and scheduling tools
Data profiling and data wrangling of XML, Web feeds and file handling using python, Unix, and SQL
Used Sqoop to channel data from different sources of HDFS and RDBMS
Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats
Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python
Used Apache Spark Data frames, Spark-SQL, Spark MLlib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries
Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts
Worked as a Data Engineer with Hadoop Ecosystems components like HBase, Sqoop, Zookeeper,Oozie, Hive and Pig with Cloudera Hadoop distribution
Tackle highly imbalanced Fraud dataset using under sampling with ensemble methods, oversampling and cost sensitive algorithms

Python Developer

Kohl's

Neenah, USA

09.2018 - 06.2020

Collaborated with compliance and surveillance teams to build data pipelines for Market Abuse Regulation, Swift messages, and legal documents, improving throughput metrics.
Created RDDs and DataFrames in Spark and HiveQL for proper schema design of imported data, improving data processing.
Extracted raw data from upstream applications using Spark and Sqoop, streamlining data flow and reducing latency by 40%.
Built and scheduled data pipelines for mainframe datasets using Spark, ensuring smooth data flow and reducing data processing time.
Developed monitoring dashboards on Cloudera Manager for KPI engines, improving fault tolerance, and enabling quick workarounds to increase system upt
Automated file delivery checks and triggered email alerts using Unix Bash scripting, improving file integrity checks.
Created Unix shell scripts for file recovery from mainframe datasets, minimizing errors, and reducing manual intervention.
Diagnosed and fixed application errors, reducing downtime by 30%, and preventing future anomalies.
Built analytics with Python for IBM MQ messages, Yarn health checks, and KPI monitoring, displaying results on Kibana, increasing data visibility and decision-making by 40%.
Designed Kafka producers and consumers for high-speed data streams, meeting SLAs for data delivery, and reducing data processing time.
Utilized Amazon S3 for data warehousing, ensuring scalability, and reducing storage costs.
Integrated KPI engines with visualization tools to enhance business analytics, improving reporting efficiency by 30%.
Developed REST APIs for Hadoop applications to improve system integration and data communication speed.
Performed disaster recovery tests and ensured continuity during quarterly shutdown activities, reducing downtime risks.

Data Analyst

jhhjj

Hyderabad, India

08.2016 - 07.2018

Gathered and analyzed business requirements to design and develop web applications using Python, Django, and JavaScript.
Designed and optimized database schemas using Django’s MVT framework, MySQL, and Cassandra, enhancing query performance and reducing data retrieval time.
Implemented Apache Storm topologies for real-time data extraction and processing, reducing data latency and improving ETL efficiency by 35%.
Analyzed and formatted data using machine learning algorithms with Python’s Scikit-Learn, improving data processing efficiency by 30%.
Developed backend services using Django, Python, and REST web services, optimizing ORM queries to enhance API response times.
Created, activated, and managed Anaconda environments for seamless development and execution of ML models, enhancing model deployment speed by 25%.
Built dynamic and responsive user interfaces using JavaScript, AJAX, JSON, jQuery, HTML5, and CSS3, increasing user engagement and loading speed.
Optimized Python code performance, implemented multithreading, and enhanced database query execution, improving system efficiency by 40%.
Automated system maintenance with shell scripting and CRON jobs, reducing manual intervention, and improving deployment efficiency.
Implemented unit and functional testing using Python’s unittest, unittest2, mock, and custom frameworks, ensuring software reliability and increasing testing efficiency.
Developed reusable ORM-based solutions, simplified complex SQL queries, and monitored application health using JIRA, accelerating query execution, and improving Agile workflow.
Built machine learning models using Python, R, and MATLAB, managing large datasets with Pandas, and optimizing data processing speed, while improving business decision-making through data visualization.
Worked in an Agile development environment, integrating testing frameworks with CI/CD pipelines for faster software delivery and improved team collaboration by 40%.

Education

Master of Science - COMPUTER SCIENCE

UNIVERSITY OF BRIDGEPORT

Bridgeport, CT

04-2023

Skills

Apache Spark
HDFS
Map Reduce
HIVE
SQOOP
Oozie
Zookeeper
Kafka
Flume
Splunk
Python
Java
Scala
R
SQL
PL/SQL
Linux Shell Script
HiveQL
Oracle 11g
Oracle 10g
MY SQL
Teradata
MS-SQL Server
DB2
HTML
XML
JDBC
JSP
CSS
JavaScript
SOAP
DataDog
Eclipse
IntelliJ
NetBeans
PuTTY
WinSCP

Linux
Unix
Windows
Mac OS-X
CentOS
Red Hat
Agile
Scrum
Waterfall
Cloudera
Horton Works
MapR
AWS
Amazon S3
Amazon EC2
Amazon EMR
Amazon LAMBDA
Amazon GLUE
Amazon ATHENA
Amazon RedShift
RBD
Azure
Azure Data Lake
Azure Data Factory
Azure Data Bricks
Azure SQL Database
Azure SQL data Warehouse
GCP
Hadoop
Spark
Django
Flask
Informatica PowerCenter
AWS Glue
Data Management
Oracle Data Integrator

Timeline

Senior Big Data Engineering

Optum

02.2023 - Current

Azure Data Engineer

Cognizant Technologies

08.2020 - 11.2021

Python Developer

Kohl's

09.2018 - 06.2020

Data Analyst

jhhjj

08.2016 - 07.2018

Master of Science - COMPUTER SCIENCE

UNIVERSITY OF BRIDGEPORT

Aishwarya M

Summary

Overview

Work History

Senior Big Data Engineering

Azure Data Engineer

Python Developer

Data Analyst

Education

Master of Science - COMPUTER SCIENCE

Skills

Timeline

Senior Big Data Engineering

Azure Data Engineer

Python Developer

Data Analyst

Master of Science - COMPUTER SCIENCE

Similar Profiles

Nowelle HoweNowelle Howe

Rushik Reddy GangamRushik Reddy Gangam

Arun ChoudharyArun Choudhary

Adam DugginsAdam Duggins

Blessen JacobBlessen Jacob