Summary
Overview
Work History
Education
Skills
Certification
Languages
Languages
Timeline
Generic

Naga M

Summary

To leverage my 5+ years of experience as a Data Engineer to contribute effectively to a dynamic organization where I can apply my expertise in designing, developing, and maintaining scalable data infrastructure and solutions.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Azure Data Engineer

KeyBank
Ohio, USA
07.2023 - Current
  • Company Overview: KeyBank is a prominent regional bank providing comprehensive banking, lending, and financial services across the United States
  • Led and executed end-to-end data management and application development initiatives, including data wrangling, pipeline design and database management
  • Managed Oracle databases, ensuring high availability, performance tuning, and security compliance for critical business applications
  • Performed data wrangling to clean, transform and reshape the data utilizing panda's library
  • Used Docker for managing the application environments
  • Build Jenkins jobs for CI/CD Infrastructure for GitHub repos
  • Design and configure database, Back-end applications and programs
  • Managed large datasets using Pandas data frames and SQL
  • Deployed and managed containerized banking applications on OpenShift, enabling scalable and efficient application delivery in a secure environment
  • SQL Query Optimization: Developed and optimized complex SQL queries and PL/SQL procedures to improve data retrieval performance and reduce response times by up to 30%
  • Presented the project to faculty and industry experts, showcasing the pipeline's effectiveness in providing real-time insights for marketing and brand management
  • Managed and monitored container security using OpenShift’s integrated tools to ensure the confidentiality, integrity, and availability of banking applications
  • Used Python based GUI components for the Front-End functionality such as selection criteria
  • Developed and deployed machine learning models to detect and prevent fraudulent transactions, achieving a significant reduction in false positives and enhancing the accuracy of fraud detection
  • Responsible for Building and Testing of applications
  • Experience in handling database issues and connections with SQL and NoSQL databases like MongoDB by installing and configuring various packages in python (Teradata, MySQL, MySQL connector, PyMongo and SQLAlchemy)
  • Conducted thorough performance analysis and tuning of Oracle instances, resulting in a 20% improvement in overall system efficiency
  • Implemented AI/ML models to improve credit scoring and risk assessment processes, leveraging historical data and predictive analytics to make more accurate creditworthiness evaluations
  • Pipelines were created in Azure Data Factory utilizing Linked Services to extract, transform, and load data from many sources such as Azure SQL Data warehouse, write-back tool, and backwards
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds
  • Involved in loading data from rest endpoints to Kafka
  • Experience in creating Kubernetes replication controllers, Clusters and label services to deployed Microservices in Docker
  • Optimized ETL workflows by fine-tuning SQL queries, adjusting data processing settings, and implementing parallel processing, resulting in [X]% reduction in processing time
  • Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export
  • Designed and implemented MS SQL Server databases, including schema design, stored procedures, and triggers to support application requirements
  • Deployed AI-powered chatbots and virtual assistants to handle customer inquiries, automate routine tasks, and provide 24/7 support, improving customer service efficiency and satisfaction
  • Implemented robust network security measures in compliance with HIPAA and other healthcare regulations, including encryption, firewalls, and VPNs, to protect sensitive patient data transmitted over LAN and WAN
  • Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects
  • Developed ETL processes using SQL Server Integration Services (SSIS) for efficient data import/export and transformation, improving data accuracy by 25%
  • Managed relational database services in which the Azure SQL handles reliability, scaling, and maintenance
  • Integrated data storage solutions
  • Analyzed and developed a modern data solution with Azure PaaS service to enable data visualization
  • Understood the application's current Production state and the impact of new installation on existing business processes
  • Monitored ETL jobs for performance issues and implemented improvements to enhance data load speed and reliability
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries
  • Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS
  • Created complex reports and dashboards using SQL Server Reporting Services (SSRS) to provide actionable insights for business decision-making
  • Wrote scripts to Import and Export data to CSV and EXCEL formats from different environments using Python and made a Celery action using REST API call
  • Implemented OpenShift-based development and testing environments for banking applications, accelerating development cycles and ensuring consistent and reliable testing outcomes
  • Developing scalable and reusable database processes and integrating them
  • Working on data management disciplines including data integration, modeling and other areas directly relevant to business intelligence/business analytics development
  • Experienced in both Oracle and MS SQL environments, with the ability to adapt to various database technologies and tools
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format
  • Performed Metadata validation, reconciliation and appropriate error handling in ETL processes
  • Installing and automation of application using configuration management tools Puppet and Chef
  • Leveraged OpenShift's integrated CI/CD pipelines to automate build, test, and deployment processes, reducing time-to-market for new banking features and updates
  • Experience working with large data sets and Machine Learning class using Tensor Flow and Apache Spark
  • Worked on Angular JS to augment browser applications with MVC capability
  • Managed Red Hat Enterprise Linux (RHEL) environments, ensuring system stability, performance, and security for critical applications
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics)
  • Fetched Twitter feeds for a certain important keyword using the python-twitter library
  • Installed, configured, and maintained RHEL systems, including kernel tuning and package management using YUM and RPM
  • Significantly optimized Python code to enhance performance and efficiency
  • Developed Python scripts to simulate hardware for testing purposes using Simi's simulator
  • KeyBank is a prominent regional bank providing comprehensive banking, lending, and financial services across the United States
  • Environment: Databricks, Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure HDInsight, Blob storage, Apache Spark, Azure ADF V2, ADLS, Spark SQL, Python/Scala, Ansible Scripts, Kubernetes, Docker, Jenkins, Azure SQL DW(Synopsis), Azure SQL DB

Aws Data Engineer

L Brands
Ohio, USA
03.2022 - 06.2023
  • Company Overview: L Brands, renowned for its iconic brands like Victoria's Secret and Bath & Body Works, offers a wide array of apparel, lingerie, beauty products, and accessories
  • Spearheaded initiatives in cloud infrastructure automation, optimized data pipelines, and facilitated robust data analysis
  • Provisioned high availability of AWS EC2 instances, migrated legacy systems to AWS, and developed Terraform plugins, modules, and templates for automating AWS infrastructure
  • Used Continuous Delivery Pipeline
  • Deployed microservices, including provisioning Azure environments and developed modules using Python scripting and Shell Scripting
  • Implemented Python automation for Capital Analysis and Review, leveraging Pandas and NumPy modules to manipulate and analyze data, ensuring accurate reporting and streamlined decision-making
  • Conducted regular audits of LAN and WAN infrastructure to ensure compliance with healthcare data security standards and regulations, and prepared detailed reports for regulatory reviews
  • Configured and maintained network settings, including firewall rules using iptables and network services like DHCP, DNS, and NFS
  • The AWS Lambda functions were written in Spark with cross-functional dependencies that generated custom libraries for delivering the Lambda function in the cloud
  • Performed raw data ingestion into, which triggered a lambda function and put refined data into ADLS
  • Spearheaded HBase setup and utilized Spark and SparkSQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy
  • Created datasets from S3 using AWS Athena and created Visual insights using AWS Quick sight Monitoring Data Quality and integrity end to end testing and reverse engineering and documented existing program and codes
  • Achieved 70% faster EMR cluster launch and configuration, optimized Hadoop job processing by 60%, improved system stability, and utilized Boto3 for seamless file writing to S3 bucket
  • Analyzing the functional requirement documents from Business
  • Built scalable data infrastructure on cloud platforms, such as AWS and GCP, using Kubernetes and Docker
  • Build Data pipelines using Python, Apache Airflow for ETL related jobs inserting data into
  • Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support
  • Worked with Docker containers in developing the images and hosting them in antifactory
  • Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods
  • Worked on CI/CD tools like Jenkins, Docker in Devops Team for setting up application process from end-to-end using Deployment for lower environments and Delivery for higher environments by using approvals in between
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis
  • Implemented security best practices for RHEL, including SELinux configurations, auditing, and vulnerability assessments to safeguard systems
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SOL DB)
  • Involved in database migration methodologies and integration conversion solutions to convert legacy ETL processes into Azure Synapse compatible architecture
  • Automated and monitored AWS infrastructure with Terraform for high availability and reliability, reducing infrastructure management time by 90% and improving system uptime
  • Zookeeper was utilized to manage synchronization, serialization, and coordination throughout the cluster after migrating from JMS Solace to Kinesis
  • Developed tools using Python, Shell scripting, XML to automate tasks
  • Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (AWS)
  • Worked on Java Message Service JMS API for developing message-oriented middleware MOM layer for handling various asynchronous requests
  • Involved in building database Model, APIs and Views utilizing Python, in order to build interactive web based solutions
  • Implemented a Reusable plug & play Python Pattern (Synapse Integration, Aggregations, Change Data Capture, Deduplication and High Watermark Implementation
  • This process accelerated the development time and standardization across teams
  • Used Python library Beautiful Soup for web scrapping's
  • Exploring with the PySpark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's
  • L Brands, renowned for its iconic brands like Victoria's Secret and Bath & Body Works, offers a wide array of apparel, lingerie, beauty products, and accessories
  • Environment: AWS EMR, Spark, Scala, Python, Hive, Sqoop, Oozie, Kafka, YARN, JIRA, S3, Redshift, Athena, Shell Scripting, GitHub, Maven

Application Developer/ Data Engineer

Cipla Health Ltd
Bangalore, India
05.2020 - 11.2021
  • Company Overview: Cipla Health Ltd is a subsidiary of Cipla Ltd focused on providing consumer healthcare products, emphasizing wellness and self-care solutions
  • Designed and implemented robust data pipelines using Azure Data Factory, integrated with Azure services like Blob Storage, Data Bricks, and Azure SQL Data Warehouse
  • Implemented data governance policies in Snowflake to ensure compliance with HIPAA regulations and maintain patient data confidentiality and integrity
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
  • Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud
  • Written queries in MySQL and Native SQL
  • Involved in monitoring and scheduling the pipelines using Triggers in Azure Data Factory
  • Maintain code version control and hold accountability for Prod-stage validation and deployment
  • Integrated Azure Data Factory with Blob Storage to move data through Data Bricks for processing and then to Azure Data Lake Storage and Azure SQL data warehouse
  • Working knowledge on Kubernetes to deploy scale, Load balance, and manage Docker containers and Open Shift with multiple namespace versions
  • Have used T-SQL for MS SQL server and ANSI SQL extensively on disparate databases
  • Configured role-based access controls within Snowflake to manage user permissions, safeguarding sensitive healthcare data while allowing appropriate access for analysis
  • Storing different configs in No SQL database Mongo DB and manipulating the configs using PyMongo
  • Developing and maintaining Azure Analysis Services models to support business intelligence and data analytics requirements, creating measures, dimensions, and hierarchies for reporting and visualization
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS
  • Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers
  • Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster and Ability to apply the spark Data Frame API to complete Data manipulation within spark session
  • Instantiated, created, and maintained CI/CD continuous integration & deployment pipelines and apply automation to environments and applications
  • Analyzed and optimized SQL queries in Snowflake to improve performance and reduce costs associated with compute resources, enhancing data retrieval times for healthcare analytics
  • Implemented Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast
  • Extracted the data from Teradata into HDFS using Sqoop
  • Cipla Health Ltd is a subsidiary of Cipla Ltd focused on providing consumer healthcare products, emphasizing wellness and self-care solutions
  • Environment: Azure, Oracle, Kafka, Python, Informatica, SQL Server, Erwin, RDS, NOSQL, Snowflake Schema, MySQL, Bash, Dynamo DB, PostgreSQL, Tableau, Git Hub, Linux/Unix

Data Engineer

Huawei Technologies
Bangalore, India
03.2019 - 04.2020
  • Company Overview: Huawei Technologies is a global leader in telecommunications equipment and consumer electronics, known for its innovation in mobile networks, smartphones, and ICT solutions
  • Designing and implementing robust data integration solutions across on-premises and cloud environments using Azure Data Factory
  • Designing and implementing data integration solutions using Azure Data Factory to move data between various data sources, including on-premises and cloud-based systems
  • Implemented Azure data lake, Azure Data factory and Azure data bricks to move and conform the data from on-premises to cloud to serve the analytical needs of the company
  • Expertise in creating and developing applications for an android operating system using Android Studio, Eclipse IDE, SQLite, Java, XML, Android SDK, and ADT plugin
  • Set up base Python structure with the create python-App package, SRSS, PySpark
  • Performed ETL to move the data from source system to destination systems and worked on the Data warehouse
  • Expertise in Business intelligence and Data Visualization tools like Tableau Used Tableau to connect to various sources and build graphs
  • Extracted and transformed the log data files from S3 by Scheduling AWS Glue jobs and loaded the transformed data into Amazon Elastic search
  • Managed large datasets using Panda data frames and SQL
  • Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop
  • Implemented AJAX, JSON, and Java script to create interactive web screens
  • Using python NLTK tool kit to be used in smart MMI interactions
  • Used PowerBI as a front-end BI tool to design and develop dashboards, workbooks, and complex aggregate calculations
  • Use Python's Unit Testing library for testing various programs on python and other codes
  • Ability to apply the spark Data Frame API to complete data manipulation within spark session
  • Huawei Technologies is a global leader in telecommunications equipment and consumer electronics, known for its innovation in mobile networks, smartphones, and ICT solutions
  • Environment: CDH, Pig, Hive, MapReduce, YARN, Oozie, Flume, Sqoop, Impala, Spark, Scala, SQL Server, Teradata, Fast Export, Oracle, Shell Scripting

Education

Master's - Computer Science

State University of New York Institute of Technology
USA
01.2024

Skills

  • AWS Services
  • Hadoop Components / Big Data
  • Databases
  • Programming Languages
  • Web Servers
  • IDE
  • NoSQL Databases
  • Methodologies
  • Cloud Services
  • ETL Tools
  • Reporting and ETL Tools
  • Data Warehousing
  • Machine Learning
  • Data Modeling
  • Big data technologies
  • Data Analysis
  • Team Collaboration
  • Data Analytics
  • Data Migration
  • Scripting Languages

Certification

  • Azure Fundamentals certified by Microsoft
  • Linux Administrator
  • CCNA in processing
  • CCISP

Languages

  • Spanish, Advanced
  • English, Native/Bilingual

Languages

English
Native/ Bilingual
Spanish
Professional

Timeline

Azure Data Engineer

KeyBank
07.2023 - Current

Aws Data Engineer

L Brands
03.2022 - 06.2023

Application Developer/ Data Engineer

Cipla Health Ltd
05.2020 - 11.2021

Data Engineer

Huawei Technologies
03.2019 - 04.2020

Master's - Computer Science

State University of New York Institute of Technology
Naga M