To leverage my 5+ years of experience as a Data Engineer to contribute effectively to a dynamic organization where I can apply my expertise in designing, developing, and maintaining scalable data infrastructure and solutions.
Overview
6
6
years of professional experience
1
1
Certification
Work History
Azure Data Engineer
KeyBank
Ohio, USA
07.2023 - Current
Company Overview: KeyBank is a prominent regional bank providing comprehensive banking, lending, and financial services across the United States
Led and executed end-to-end data management and application development initiatives, including data wrangling, pipeline design and database management
Managed Oracle databases, ensuring high availability, performance tuning, and security compliance for critical business applications
Performed data wrangling to clean, transform and reshape the data utilizing panda's library
Used Docker for managing the application environments
Build Jenkins jobs for CI/CD Infrastructure for GitHub repos
Design and configure database, Back-end applications and programs
Managed large datasets using Pandas data frames and SQL
Deployed and managed containerized banking applications on OpenShift, enabling scalable and efficient application delivery in a secure environment
SQL Query Optimization: Developed and optimized complex SQL queries and PL/SQL procedures to improve data retrieval performance and reduce response times by up to 30%
Presented the project to faculty and industry experts, showcasing the pipeline's effectiveness in providing real-time insights for marketing and brand management
Managed and monitored container security using OpenShift’s integrated tools to ensure the confidentiality, integrity, and availability of banking applications
Used Python based GUI components for the Front-End functionality such as selection criteria
Developed and deployed machine learning models to detect and prevent fraudulent transactions, achieving a significant reduction in false positives and enhancing the accuracy of fraud detection
Responsible for Building and Testing of applications
Experience in handling database issues and connections with SQL and NoSQL databases like MongoDB by installing and configuring various packages in python (Teradata, MySQL, MySQL connector, PyMongo and SQLAlchemy)
Conducted thorough performance analysis and tuning of Oracle instances, resulting in a 20% improvement in overall system efficiency
Implemented AI/ML models to improve credit scoring and risk assessment processes, leveraging historical data and predictive analytics to make more accurate creditworthiness evaluations
Pipelines were created in Azure Data Factory utilizing Linked Services to extract, transform, and load data from many sources such as Azure SQL Data warehouse, write-back tool, and backwards
Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds
Involved in loading data from rest endpoints to Kafka
Experience in creating Kubernetes replication controllers, Clusters and label services to deployed Microservices in Docker
Optimized ETL workflows by fine-tuning SQL queries, adjusting data processing settings, and implementing parallel processing, resulting in [X]% reduction in processing time
Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export
Designed and implemented MS SQL Server databases, including schema design, stored procedures, and triggers to support application requirements
Deployed AI-powered chatbots and virtual assistants to handle customer inquiries, automate routine tasks, and provide 24/7 support, improving customer service efficiency and satisfaction
Implemented robust network security measures in compliance with HIPAA and other healthcare regulations, including encryption, firewalls, and VPNs, to protect sensitive patient data transmitted over LAN and WAN
Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects
Developed ETL processes using SQL Server Integration Services (SSIS) for efficient data import/export and transformation, improving data accuracy by 25%
Managed relational database services in which the Azure SQL handles reliability, scaling, and maintenance
Integrated data storage solutions
Analyzed and developed a modern data solution with Azure PaaS service to enable data visualization
Understood the application's current Production state and the impact of new installation on existing business processes
Monitored ETL jobs for performance issues and implemented improvements to enhance data load speed and reliability
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries
Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS
Created complex reports and dashboards using SQL Server Reporting Services (SSRS) to provide actionable insights for business decision-making
Wrote scripts to Import and Export data to CSV and EXCEL formats from different environments using Python and made a Celery action using REST API call
Implemented OpenShift-based development and testing environments for banking applications, accelerating development cycles and ensuring consistent and reliable testing outcomes
Developing scalable and reusable database processes and integrating them
Working on data management disciplines including data integration, modeling and other areas directly relevant to business intelligence/business analytics development
Experienced in both Oracle and MS SQL environments, with the ability to adapt to various database technologies and tools
Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format
Performed Metadata validation, reconciliation and appropriate error handling in ETL processes
Installing and automation of application using configuration management tools Puppet and Chef
Leveraged OpenShift's integrated CI/CD pipelines to automate build, test, and deployment processes, reducing time-to-market for new banking features and updates
Experience working with large data sets and Machine Learning class using Tensor Flow and Apache Spark
Worked on Angular JS to augment browser applications with MVC capability
Managed Red Hat Enterprise Linux (RHEL) environments, ensuring system stability, performance, and security for critical applications
Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics)
Fetched Twitter feeds for a certain important keyword using the python-twitter library
Installed, configured, and maintained RHEL systems, including kernel tuning and package management using YUM and RPM
Significantly optimized Python code to enhance performance and efficiency
Developed Python scripts to simulate hardware for testing purposes using Simi's simulator
KeyBank is a prominent regional bank providing comprehensive banking, lending, and financial services across the United States
Environment: Databricks, Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure HDInsight, Blob storage, Apache Spark, Azure ADF V2, ADLS, Spark SQL, Python/Scala, Ansible Scripts, Kubernetes, Docker, Jenkins, Azure SQL DW(Synopsis), Azure SQL DB
Aws Data Engineer
L Brands
Ohio, USA
03.2022 - 06.2023
Company Overview: L Brands, renowned for its iconic brands like Victoria's Secret and Bath & Body Works, offers a wide array of apparel, lingerie, beauty products, and accessories
Spearheaded initiatives in cloud infrastructure automation, optimized data pipelines, and facilitated robust data analysis
Provisioned high availability of AWS EC2 instances, migrated legacy systems to AWS, and developed Terraform plugins, modules, and templates for automating AWS infrastructure
Used Continuous Delivery Pipeline
Deployed microservices, including provisioning Azure environments and developed modules using Python scripting and Shell Scripting
Implemented Python automation for Capital Analysis and Review, leveraging Pandas and NumPy modules to manipulate and analyze data, ensuring accurate reporting and streamlined decision-making
Conducted regular audits of LAN and WAN infrastructure to ensure compliance with healthcare data security standards and regulations, and prepared detailed reports for regulatory reviews
Configured and maintained network settings, including firewall rules using iptables and network services like DHCP, DNS, and NFS
The AWS Lambda functions were written in Spark with cross-functional dependencies that generated custom libraries for delivering the Lambda function in the cloud
Performed raw data ingestion into, which triggered a lambda function and put refined data into ADLS
Spearheaded HBase setup and utilized Spark and SparkSQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy
Created datasets from S3 using AWS Athena and created Visual insights using AWS Quick sight Monitoring Data Quality and integrity end to end testing and reverse engineering and documented existing program and codes
Achieved 70% faster EMR cluster launch and configuration, optimized Hadoop job processing by 60%, improved system stability, and utilized Boto3 for seamless file writing to S3 bucket
Analyzing the functional requirement documents from Business
Built scalable data infrastructure on cloud platforms, such as AWS and GCP, using Kubernetes and Docker
Build Data pipelines using Python, Apache Airflow for ETL related jobs inserting data into
Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support
Worked with Docker containers in developing the images and hosting them in antifactory
Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods
Worked on CI/CD tools like Jenkins, Docker in Devops Team for setting up application process from end-to-end using Deployment for lower environments and Delivery for higher environments by using approvals in between
Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis
Implemented security best practices for RHEL, including SELinux configurations, auditing, and vulnerability assessments to safeguard systems
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SOL DB)
Involved in database migration methodologies and integration conversion solutions to convert legacy ETL processes into Azure Synapse compatible architecture
Automated and monitored AWS infrastructure with Terraform for high availability and reliability, reducing infrastructure management time by 90% and improving system uptime
Zookeeper was utilized to manage synchronization, serialization, and coordination throughout the cluster after migrating from JMS Solace to Kinesis
Developed tools using Python, Shell scripting, XML to automate tasks
Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (AWS)
Worked on Java Message Service JMS API for developing message-oriented middleware MOM layer for handling various asynchronous requests
Involved in building database Model, APIs and Views utilizing Python, in order to build interactive web based solutions
Implemented a Reusable plug & play Python Pattern (Synapse Integration, Aggregations, Change Data Capture, Deduplication and High Watermark Implementation
This process accelerated the development time and standardization across teams
Used Python library Beautiful Soup for web scrapping's
Exploring with the PySpark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's
L Brands, renowned for its iconic brands like Victoria's Secret and Bath & Body Works, offers a wide array of apparel, lingerie, beauty products, and accessories
Company Overview: Cipla Health Ltd is a subsidiary of Cipla Ltd focused on providing consumer healthcare products, emphasizing wellness and self-care solutions
Designed and implemented robust data pipelines using Azure Data Factory, integrated with Azure services like Blob Storage, Data Bricks, and Azure SQL Data Warehouse
Implemented data governance policies in Snowflake to ensure compliance with HIPAA regulations and maintain patient data confidentiality and integrity
Created Pipelines in ADF using Linked Services/Datasets/Pipeline to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud
Written queries in MySQL and Native SQL
Involved in monitoring and scheduling the pipelines using Triggers in Azure Data Factory
Maintain code version control and hold accountability for Prod-stage validation and deployment
Integrated Azure Data Factory with Blob Storage to move data through Data Bricks for processing and then to Azure Data Lake Storage and Azure SQL data warehouse
Working knowledge on Kubernetes to deploy scale, Load balance, and manage Docker containers and Open Shift with multiple namespace versions
Have used T-SQL for MS SQL server and ANSI SQL extensively on disparate databases
Configured role-based access controls within Snowflake to manage user permissions, safeguarding sensitive healthcare data while allowing appropriate access for analysis
Storing different configs in No SQL database Mongo DB and manipulating the configs using PyMongo
Developing and maintaining Azure Analysis Services models to support business intelligence and data analytics requirements, creating measures, dimensions, and hierarchies for reporting and visualization
Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS
Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers
Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster and Ability to apply the spark Data Frame API to complete Data manipulation within spark session
Instantiated, created, and maintained CI/CD continuous integration & deployment pipelines and apply automation to environments and applications
Analyzed and optimized SQL queries in Snowflake to improve performance and reduce costs associated with compute resources, enhancing data retrieval times for healthcare analytics
Implemented Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast
Extracted the data from Teradata into HDFS using Sqoop
Cipla Health Ltd is a subsidiary of Cipla Ltd focused on providing consumer healthcare products, emphasizing wellness and self-care solutions
Company Overview: Huawei Technologies is a global leader in telecommunications equipment and consumer electronics, known for its innovation in mobile networks, smartphones, and ICT solutions
Designing and implementing robust data integration solutions across on-premises and cloud environments using Azure Data Factory
Designing and implementing data integration solutions using Azure Data Factory to move data between various data sources, including on-premises and cloud-based systems
Implemented Azure data lake, Azure Data factory and Azure data bricks to move and conform the data from on-premises to cloud to serve the analytical needs of the company
Expertise in creating and developing applications for an android operating system using Android Studio, Eclipse IDE, SQLite, Java, XML, Android SDK, and ADT plugin
Set up base Python structure with the create python-App package, SRSS, PySpark
Performed ETL to move the data from source system to destination systems and worked on the Data warehouse
Expertise in Business intelligence and Data Visualization tools like Tableau Used Tableau to connect to various sources and build graphs
Extracted and transformed the log data files from S3 by Scheduling AWS Glue jobs and loaded the transformed data into Amazon Elastic search
Managed large datasets using Panda data frames and SQL
Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop
Implemented AJAX, JSON, and Java script to create interactive web screens
Using python NLTK tool kit to be used in smart MMI interactions
Used PowerBI as a front-end BI tool to design and develop dashboards, workbooks, and complex aggregate calculations
Use Python's Unit Testing library for testing various programs on python and other codes
Ability to apply the spark Data Frame API to complete data manipulation within spark session
Huawei Technologies is a global leader in telecommunications equipment and consumer electronics, known for its innovation in mobile networks, smartphones, and ICT solutions