Summary
Overview
Work History
Education
Skills
Timeline
Generic

Aishwarya M

Summary

Around 9 years of IT experience with Amazon Web Services (Amazon EC2, Amazon S3 or Lambda, AWS cloud watch, Amazon elastic load balancer, Amazon Simple DB, Amazon RDS, Elastic Search, Amazon MQ, Amazon Lambdas, Amazon SQS, AWS Identity and access management, Amazon EBS and Amazon CloudFormation). Experience in working with AWS Code Pipeline to deploy Docker containers in AWS ECS using services like CloudFormation, Code Build, Code Deploy. Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS). Experienced in Automating, Configuring, and deploying Instances on AWS, Azure environments and Data centers and managing security groups on AWS. Good knowledge in Technologies on systems which comprises of massive amount of data running in highlydistributive mode in Cloudera, Hortonworks Hadoop distributions and Amazon AWS. Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure. Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory. Experience in dealing with Windows Azure IaaS - Virtual Networks, Virtual Machines, Cloud Services, Resource Groups, Express Route, Traffic Manager, VPN, Load Balancing, Application Gateways, Autoscaling. Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure. Experience in building power bi reports on Azure Analysis services for better performance when comparing that to direct query using GCP Big Query. Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributedstream-processing). Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Flume, MapReduce framework, Yarn, Scala, and Hue. Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating accordingto client's requirement. Ingested data into Snowflake cloud data warehouse using Snow pipe. Extensive experience in working with micro batching to ingest millions of files on Snowflake cloud whenfiles arrive to staging area. Results-driven individual with a solid track record in delivering quality work. Known for excellent communication and teamwork abilities, with a commitment to achieving company goals and delivering exceptional service. Passionate about continuous learning and professional development.

Overview

9
9
years of professional experience

Work History

Senior Big Data Engineering

Optum
Minneapolis, MN
02.2023 - Current
  • Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, Redshift, Lambda and Glue)
  • Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
  • Pull data from SQL server, Teradata, Amazon S3 bucket & internal SFTP and dump them into data warehouse AWS S3 bucket
  • Used AWS glue ETL and data brew service that consumes raw data from S3 bucket and transforms raw data as per the requirement and write the output to s3 bucket in parquet format for data analytics purpose
  • Leading the testing efforts in support of projects/programs across a large landscape of technologies (Unix, Angular JS, AWS, Sause LABS, Cucumber JVM, Mongo DB, GitHub, Bitbucket, SQL, NoSQL database
  • Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python Matplotlib
  • Extensive expertise using the core Spark APIs and processing data on a EMR cluster Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server
  • Stored data in AWS S3 like HDFS and performed EMR (Elastic MapReduce) jobs on data stored in S3
  • Experience in AWS Glue for fully managed ETL service that makes it easy to move data between data stores and Manage ETL jobs that automate data processing and reporting
  • Created numerous ODI interfaces and load into Snowflake DB
  • Worked on Amazon Redshift for shifting all Data warehouses into one Data warehouse
  • Used Spark Scala to import customer information data from Oracle database into HDFS for data processing along with minor cleansing
  • Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, Pyspark
  • Learner data model which gets the data from Kafka in real time and persist it to Cassandra
  • Developed Kafka consumer API in python for consuming data from Kafka topics
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS
  • Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe- line system
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS
  • Worked on reading and writing multiple data formats like JSON, ORC, parquet on HDFS using Pyspark
  • Managed Zookeeper for cluster co-ordination and Kafka Offset monitoring
  • Using Apache Airflow managed, structured, and organized ETL pipelines using Directed Acyclic Graphs(DAGs)
  • Created reports for BI team using SQOOP to export data into HDFS and Hive
  • Launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster
  • Hands on experience in loading data from UNIX file system to HDFS
  • Experienced on loading and transforming of large sets of structured, semi structured, and unstructureddata from HBase through Sqoop and placed in HDFS for further processing
  • Worked on Cloudera to analyze data present on top of HDFS
  • Worked extensively on Hive and PIG
  • Automated the data processing with Oozie to automate data loading into the Hadoop Distributed FileSystem
  • Scheduling Batch jobs through AWS Batch and performing Data processing jobs by leveraging Apache Spark APIs
  • Extensively used the advanced features of PL/SQL like Records, Tables, Object types and Dynamic SQL
  • Analyzed data to identify, investigate and report trends linked to fraudulent transactions and claims
  • Created and worked on SQOOP jobs with incremental load to populate Hive External tables

Azure Data Engineer

Cognizant Technologies
AR
08.2020 - 11.2021
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks
  • Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data
  • Understand current Production state of application and determine the impact of new implementation on existing business processes
  • Deployed the initial Azure components like Azure Virtual Networks, Azure Application Gateway, Azure Storage and Affinity groups
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
  • Experience in moving data between GCP and Azure using Azure Data Factory
  • Created data bricks notebooks using Python (PySpark), Scala and Spark SQL for transforming the data that is stored in Azure Data Lake stored Gen2 from Raw to Stage and Curated zones
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
  • Writing Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS
  • Experience on end-to-end implementation of Snowflake cloud data warehouse
  • Expertise in Snowflake - data modelling, ELT using Snowflake SQL, implementing complex stored Procedures and standard DWH and ETL concepts
  • Design and implement multiple ETL solutions with various data sources by extensive SQL Scripting, ETL tools, Python, Shell Scripting, and scheduling tools
  • Data profiling and data wrangling of XML, Web feeds and file handling using python, Unix, and SQL
  • Used Sqoop to channel data from different sources of HDFS and RDBMS
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats
  • Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python
  • Used Apache Spark Data frames, Spark-SQL, Spark MLlib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries
  • Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts
  • Worked as a Data Engineer with Hadoop Ecosystems components like HBase, Sqoop, Zookeeper,Oozie, Hive and Pig with Cloudera Hadoop distribution
  • Tackle highly imbalanced Fraud dataset using under sampling with ensemble methods, oversampling and cost sensitive algorithms

Python Developer

Kohl's
Neenah, USA
09.2018 - 06.2020
  • Collaborated with compliance and surveillance teams to build data pipelines for Market Abuse Regulation, Swift messages, and legal documents, improving throughput metrics.
  • Created RDDs and DataFrames in Spark and HiveQL for proper schema design of imported data, improving data processing.
  • Extracted raw data from upstream applications using Spark and Sqoop, streamlining data flow and reducing latency by 40%.
  • Built and scheduled data pipelines for mainframe datasets using Spark, ensuring smooth data flow and reducing data processing time.
  • Developed monitoring dashboards on Cloudera Manager for KPI engines, improving fault tolerance, and enabling quick workarounds to increase system upt
  • Automated file delivery checks and triggered email alerts using Unix Bash scripting, improving file integrity checks.
  • Created Unix shell scripts for file recovery from mainframe datasets, minimizing errors, and reducing manual intervention.
  • Diagnosed and fixed application errors, reducing downtime by 30%, and preventing future anomalies.
  • Built analytics with Python for IBM MQ messages, Yarn health checks, and KPI monitoring, displaying results on Kibana, increasing data visibility and decision-making by 40%.
  • Designed Kafka producers and consumers for high-speed data streams, meeting SLAs for data delivery, and reducing data processing time.
  • Utilized Amazon S3 for data warehousing, ensuring scalability, and reducing storage costs.
  • Integrated KPI engines with visualization tools to enhance business analytics, improving reporting efficiency by 30%.
  • Developed REST APIs for Hadoop applications to improve system integration and data communication speed.
  • Performed disaster recovery tests and ensured continuity during quarterly shutdown activities, reducing downtime risks.

Data Analyst

jhhjj
Hyderabad, India
08.2016 - 07.2018
  • Gathered and analyzed business requirements to design and develop web applications using Python, Django, and JavaScript.
  • Designed and optimized database schemas using Django’s MVT framework, MySQL, and Cassandra, enhancing query performance and reducing data retrieval time.
  • Implemented Apache Storm topologies for real-time data extraction and processing, reducing data latency and improving ETL efficiency by 35%.
  • Analyzed and formatted data using machine learning algorithms with Python’s Scikit-Learn, improving data processing efficiency by 30%.
  • Developed backend services using Django, Python, and REST web services, optimizing ORM queries to enhance API response times.
  • Created, activated, and managed Anaconda environments for seamless development and execution of ML models, enhancing model deployment speed by 25%.
  • Built dynamic and responsive user interfaces using JavaScript, AJAX, JSON, jQuery, HTML5, and CSS3, increasing user engagement and loading speed.
  • Optimized Python code performance, implemented multithreading, and enhanced database query execution, improving system efficiency by 40%.
  • Automated system maintenance with shell scripting and CRON jobs, reducing manual intervention, and improving deployment efficiency.
  • Implemented unit and functional testing using Python’s unittest, unittest2, mock, and custom frameworks, ensuring software reliability and increasing testing efficiency.
  • Developed reusable ORM-based solutions, simplified complex SQL queries, and monitored application health using JIRA, accelerating query execution, and improving Agile workflow.
  • Built machine learning models using Python, R, and MATLAB, managing large datasets with Pandas, and optimizing data processing speed, while improving business decision-making through data visualization.
  • Worked in an Agile development environment, integrating testing frameworks with CI/CD pipelines for faster software delivery and improved team collaboration by 40%.

Education

Master of Science - COMPUTER SCIENCE

UNIVERSITY OF BRIDGEPORT
Bridgeport, CT
04-2023

Skills

  • Apache Spark
  • HDFS
  • Map Reduce
  • HIVE
  • SQOOP
  • Oozie
  • Zookeeper
  • Kafka
  • Flume
  • Splunk
  • Python
  • Java
  • Scala
  • R
  • SQL
  • PL/SQL
  • Linux Shell Script
  • HiveQL
  • Oracle 11g
  • Oracle 10g
  • MY SQL
  • Teradata
  • MS-SQL Server
  • DB2
  • HTML
  • XML
  • JDBC
  • JSP
  • CSS
  • JavaScript
  • SOAP
  • DataDog
  • Eclipse
  • IntelliJ
  • NetBeans
  • PuTTY
  • WinSCP
  • Linux
  • Unix
  • Windows
  • Mac OS-X
  • CentOS
  • Red Hat
  • Agile
  • Scrum
  • Waterfall
  • Cloudera
  • Horton Works
  • MapR
  • AWS
  • Amazon S3
  • Amazon EC2
  • Amazon EMR
  • Amazon LAMBDA
  • Amazon GLUE
  • Amazon ATHENA
  • Amazon RedShift
  • RBD
  • Azure
  • Azure Data Lake
  • Azure Data Factory
  • Azure Data Bricks
  • Azure SQL Database
  • Azure SQL data Warehouse
  • GCP
  • Hadoop
  • Spark
  • Django
  • Flask
  • Informatica PowerCenter
  • AWS Glue
  • Data Management
  • Oracle Data Integrator

Timeline

Senior Big Data Engineering

Optum
02.2023 - Current

Azure Data Engineer

Cognizant Technologies
08.2020 - 11.2021

Python Developer

Kohl's
09.2018 - 06.2020

Data Analyst

jhhjj
08.2016 - 07.2018

Master of Science - COMPUTER SCIENCE

UNIVERSITY OF BRIDGEPORT
Aishwarya M