Summary
Overview
Work History
Education
Skills
Languages
References
Websites
Languages
Timeline
Generic

Rupak Roy

Queens,NY

Summary

As a Senior Data Engineer with over 7 years of experience, I have honed my skills in creating resilient and scalable data infrastructures, excelling in both AWS and Azure platforms. My career, marked by impactful tenures at BNY Mellon, Texas Instrument, and U.S. Bank, has centered on deploying complex ETL pipelines, crafting advanced analytics through machine learning models, and leading strategic data governance initiatives. I bring a nuanced understanding of data systems, reinforced by continuous learning and application of the latest technologies in real-world business scenarios.

Overview

7
7
years of professional experience

Work History

Sr. Data Engineer

U.S Bank
Remote
02.2021 - Current
  • Orchestrated data extraction using SSIS packages, streamlining Azure Blob Storage loading
  • Scheduled and executed ADF pipelines, enhancing data workflow and transformations
  • Employed ADF data flows for complex data manipulation, ensuring accuracy and efficiency
  • Enabled seamless data ingestion from Blob Storage to Azure SQL Server using AD integration
  • Fortified data systems with robust encryption and access control for security
  • Exposed SQL Server data via Azure API Management for external app consumption
  • Partnered with diverse teams to design and troubleshoot data integration solutions
  • Authored comprehensive technical documentation to support data integration maintenance
  • Maintained expertise in Azure data services, focusing on performance optimization
  • Centralized business logic application in SQL databases, streamlining data management
  • Utilized Informatica PowerCenter for comprehensive ETL processes in data warehousing
  • Enhanced data handling efficiency with advanced Informatica transformations
  • Enabled reusability and flexibility in ETL processes by parameterizing Informatica mappings
  • Designed Azure Databricks data pipelines for both real-time and batch processing
  • Advanced data infrastructure scalability using Azure Data Lake, Blob Storage, and Delta Lake
  • Analyzed large data sets to deliver actionable insights, addressing key business questions
  • Automated data processes using cutting-edge metadata management tools, increasing operational efficiency
  • Implemented data ingestion and integration techniques to support comprehensive data strategies
  • Established data governance frameworks to maintain data integrity across the organization
  • Collaborated across functions to implement and monitor data analytics models
  • Engineered sophisticated machine learning models with Python on Azure ML Studio to automate data analysis, resulting in a 25% increase in operational efficiency for marketing analytics
  • Led the deployment of AI solutions in Azure, using Cognitive Services and Bot Service to enhance customer interactions, driving a 40% improvement in customer satisfaction scores
  • Architected and executed data engineering pipelines on Azure Databricks, integrating with Azure Synapse for seamless data warehousing and advanced analytics
  • Developed Python scripts to automate the training and deployment of machine learning models on Azure, significantly reducing model iteration time from weeks to days
  • Utilized Python and Azure's AI tools to implement natural language processing (NLP) solutions, improving the accuracy of sentiment analysis by 30% in customer feedback applications.

Data Engineer

Texas Instrument
Plano, Texas
09.2019 - 01.2021
  • Developed a Python script to convert JSON data to XML, supporting various data structures for ADF integration
  • Configured dynamic data loading from JSON to Azure SQL using ADF's copy activity for diverse data types
  • Automated data movement with ADF pipelines, enhancing scheduling and monitoring efficiency
  • Optimized ADF performance through data partitioning, compression, and parallelization techniques
  • Implemented error handling in data conversion and loading processes for robust data integrity
  • Engaged with stakeholders to ascertain technical requirements and communicated project progress
  • Maintained up-to-date knowledge of Python and ADF best practices, focusing on data validation and RESTful API integration
  • Managed database creation and structured tables for various applications, ensuring scalability and data consistency
  • Designed AWS-based data architectures, utilizing services like S3, Redshift, and Glue for fault tolerance
  • Centralized data cataloging with AWS Glue for improved data discoverability and governance
  • Implemented serverless real-time data processing with AWS Kinesis and Lambda for event-driven architectures
  • Conducted regular security audits and data encryption for compliance with industry standards
  • Led ETL pipeline development with AWS Glue, automating data extraction and transformation processes
  • Designed and implemented performant ETL pipelines using PySpark and Azure Data Factory for data warehousing solutions
  • Managed AWS security measures, including IAM roles, security groups, and VPC configurations, to uphold data protection standards deployed robust AWS cloud infrastructures to support scalable applications, leveraging services such as EC2, S3, and RDS for optimal performance
  • Coordinated complex data pipeline management, integrating Cloudera clusters with various data systems and BI tools
  • Developed and optimized data pipelines for efficient data processing and workflow integration
  • Crafted custom data models and algorithms, improving the utility of diverse data sets
  • Evaluated new data sources for accuracy and implemented advanced data collection methods
  • Led initiatives for data acquisition, enhancing the breadth of analytics and reporting capabilities
  • Utilized a range of programming languages to integrate disparate systems for streamlined operations.

Data Engineer

BNY Mellon
New York
02.2017 - 08.2019
  • Integrated AWS analytics services like Redshift, EMR, and Kinesis to facilitate complex data processing and real-time analytics
  • Monitored and maintained AWS environments with CloudWatch, ensuring system health and proactive incident management
  • Provided expert support for complex Hadoop ecosystem components and conducted optimization for Cloudera clusters
  • Led cross-functional team collaborations to implement disaster recovery solutions for Cloudera
  • Drove the expansion of Cloudera clusters by assessing and integrating new hardware and software components
  • Contributed to internal knowledge sharing initiatives for Oracle DB and Cloudera administration best practices
  • Designed and built scalable data solutions with Hadoop and Spark, integrating with Scala
  • Managed extensive data extraction, cleaning, and analysis tasks, demonstrating proficiency in SQL, PL/SQL, SSIS, and SSAS
  • Created and managed Hive data tables and orchestrated data deployment using Jenkins
  • Utilized AppDynamics within the JBoss environment to enhance application performance monitoring
  • Architected a multi-layered application framework, incorporating Struts, Hibernate, and Spring
  • Performed thorough data validation on international datasets, ensuring accuracy and integration
  • Optimized MapReduce jobs for efficient HDFS utilization and managed data importation using various transformation tools
  • Transitioned data processing from Hadoop to Spark, leveraging in-memory computing for real-time analytics
  • Spearheaded a proof of concept with AWS to assess the feasibility of Hadoop as a solution
  • Ensured high availability of Oracle databases and led a team of DBAs supporting over 50+ Oracle instances
  • Developed automation scripts to enhance the management of large-scale Cloudera clusters and streamline operations.

Education

Bachelor's Degree in Computer Science -

Uttara Institute of Business and Technology

Skills

  • Data Modeling
  • Machine Learning
  • Data Warehousing
  • Performance Tuning
  • API Development
  • NoSQL Databases
  • Data Security
  • Python Programming
  • Continuous Integration
  • Data Analysis
  • SQL Transactional Replications
  • Risk Analysis
  • Big Data Technologies
  • SQL and Databases
  • Database Design
  • Data Migration
  • Relational Databases
  • Business Intelligence
  • Database Structures
  • Enterprise Resource Planning Software
  • Data Aggregation Processes
  • Analytical Thinking
  • Data Acquisitions
  • Deep Learning
  • Statistical Analysis
  • Quantitative Analysis Expertise
  • Azure
  • AWS

Languages

  • English, Fluent
  • Bengali, Native
  • Hindi, Fluent

References

References available upon request

Languages

English
Professional
Bengali
Professional
Hindi
Professional

Timeline

Sr. Data Engineer

U.S Bank
02.2021 - Current

Data Engineer

Texas Instrument
09.2019 - 01.2021

Data Engineer

BNY Mellon
02.2017 - 08.2019

Bachelor's Degree in Computer Science -

Uttara Institute of Business and Technology
Rupak Roy