Summary

Overview

Work History

Education

Skills

Languages

References

Websites

Languages

Timeline

Rupak Roy

Queens,NY

Summary

As a Senior Data Engineer with over 7 years of experience, I have honed my skills in creating resilient and scalable data infrastructures, excelling in both AWS and Azure platforms. My career, marked by impactful tenures at BNY Mellon, Texas Instrument, and U.S. Bank, has centered on deploying complex ETL pipelines, crafting advanced analytics through machine learning models, and leading strategic data governance initiatives. I bring a nuanced understanding of data systems, reinforced by continuous learning and application of the latest technologies in real-world business scenarios.

Overview

years of professional experience

Work History

Sr. Data Engineer

U.S Bank

Remote

02.2021 - Current

Orchestrated data extraction using SSIS packages, streamlining Azure Blob Storage loading
Scheduled and executed ADF pipelines, enhancing data workflow and transformations
Employed ADF data flows for complex data manipulation, ensuring accuracy and efficiency
Enabled seamless data ingestion from Blob Storage to Azure SQL Server using AD integration
Fortified data systems with robust encryption and access control for security
Exposed SQL Server data via Azure API Management for external app consumption
Partnered with diverse teams to design and troubleshoot data integration solutions
Authored comprehensive technical documentation to support data integration maintenance
Maintained expertise in Azure data services, focusing on performance optimization
Centralized business logic application in SQL databases, streamlining data management
Utilized Informatica PowerCenter for comprehensive ETL processes in data warehousing
Enhanced data handling efficiency with advanced Informatica transformations
Enabled reusability and flexibility in ETL processes by parameterizing Informatica mappings
Designed Azure Databricks data pipelines for both real-time and batch processing
Advanced data infrastructure scalability using Azure Data Lake, Blob Storage, and Delta Lake
Analyzed large data sets to deliver actionable insights, addressing key business questions
Automated data processes using cutting-edge metadata management tools, increasing operational efficiency
Implemented data ingestion and integration techniques to support comprehensive data strategies
Established data governance frameworks to maintain data integrity across the organization
Collaborated across functions to implement and monitor data analytics models
Engineered sophisticated machine learning models with Python on Azure ML Studio to automate data analysis, resulting in a 25% increase in operational efficiency for marketing analytics
Led the deployment of AI solutions in Azure, using Cognitive Services and Bot Service to enhance customer interactions, driving a 40% improvement in customer satisfaction scores
Architected and executed data engineering pipelines on Azure Databricks, integrating with Azure Synapse for seamless data warehousing and advanced analytics
Developed Python scripts to automate the training and deployment of machine learning models on Azure, significantly reducing model iteration time from weeks to days
Utilized Python and Azure's AI tools to implement natural language processing (NLP) solutions, improving the accuracy of sentiment analysis by 30% in customer feedback applications.

Data Engineer

Texas Instrument

Plano, Texas

09.2019 - 01.2021

Developed a Python script to convert JSON data to XML, supporting various data structures for ADF integration
Configured dynamic data loading from JSON to Azure SQL using ADF's copy activity for diverse data types
Automated data movement with ADF pipelines, enhancing scheduling and monitoring efficiency
Optimized ADF performance through data partitioning, compression, and parallelization techniques
Implemented error handling in data conversion and loading processes for robust data integrity
Engaged with stakeholders to ascertain technical requirements and communicated project progress
Maintained up-to-date knowledge of Python and ADF best practices, focusing on data validation and RESTful API integration
Managed database creation and structured tables for various applications, ensuring scalability and data consistency
Designed AWS-based data architectures, utilizing services like S3, Redshift, and Glue for fault tolerance
Centralized data cataloging with AWS Glue for improved data discoverability and governance
Implemented serverless real-time data processing with AWS Kinesis and Lambda for event-driven architectures
Conducted regular security audits and data encryption for compliance with industry standards
Led ETL pipeline development with AWS Glue, automating data extraction and transformation processes
Designed and implemented performant ETL pipelines using PySpark and Azure Data Factory for data warehousing solutions
Managed AWS security measures, including IAM roles, security groups, and VPC configurations, to uphold data protection standards deployed robust AWS cloud infrastructures to support scalable applications, leveraging services such as EC2, S3, and RDS for optimal performance
Coordinated complex data pipeline management, integrating Cloudera clusters with various data systems and BI tools
Developed and optimized data pipelines for efficient data processing and workflow integration
Crafted custom data models and algorithms, improving the utility of diverse data sets
Evaluated new data sources for accuracy and implemented advanced data collection methods
Led initiatives for data acquisition, enhancing the breadth of analytics and reporting capabilities
Utilized a range of programming languages to integrate disparate systems for streamlined operations.

Data Engineer

BNY Mellon

New York

02.2017 - 08.2019

Integrated AWS analytics services like Redshift, EMR, and Kinesis to facilitate complex data processing and real-time analytics
Monitored and maintained AWS environments with CloudWatch, ensuring system health and proactive incident management
Provided expert support for complex Hadoop ecosystem components and conducted optimization for Cloudera clusters
Led cross-functional team collaborations to implement disaster recovery solutions for Cloudera
Drove the expansion of Cloudera clusters by assessing and integrating new hardware and software components
Contributed to internal knowledge sharing initiatives for Oracle DB and Cloudera administration best practices
Designed and built scalable data solutions with Hadoop and Spark, integrating with Scala
Managed extensive data extraction, cleaning, and analysis tasks, demonstrating proficiency in SQL, PL/SQL, SSIS, and SSAS
Created and managed Hive data tables and orchestrated data deployment using Jenkins
Utilized AppDynamics within the JBoss environment to enhance application performance monitoring
Architected a multi-layered application framework, incorporating Struts, Hibernate, and Spring
Performed thorough data validation on international datasets, ensuring accuracy and integration
Optimized MapReduce jobs for efficient HDFS utilization and managed data importation using various transformation tools
Transitioned data processing from Hadoop to Spark, leveraging in-memory computing for real-time analytics
Spearheaded a proof of concept with AWS to assess the feasibility of Hadoop as a solution
Ensured high availability of Oracle databases and led a team of DBAs supporting over 50+ Oracle instances
Developed automation scripts to enhance the management of large-scale Cloudera clusters and streamline operations.

Education

Bachelor's Degree in Computer Science -

Uttara Institute of Business and Technology

Skills

Data Modeling
Machine Learning
Data Warehousing
Performance Tuning
API Development
NoSQL Databases
Data Security
Python Programming
Continuous Integration
Data Analysis
SQL Transactional Replications
Risk Analysis
Big Data Technologies
SQL and Databases

Database Design
Data Migration
Relational Databases
Business Intelligence
Database Structures
Enterprise Resource Planning Software
Data Aggregation Processes
Analytical Thinking
Data Acquisitions
Deep Learning
Statistical Analysis
Quantitative Analysis Expertise
Azure
AWS

Languages

English, Fluent
Bengali, Native
Hindi, Fluent

References

References available upon request

Websites

linkedin.com/in/rupak-roy-58872829b

Languages

English

Professional

Bengali

Professional

Hindi

Professional

Timeline

Sr. Data Engineer

U.S Bank

02.2021 - Current

Data Engineer

Texas Instrument

09.2019 - 01.2021

Data Engineer

BNY Mellon

02.2017 - 08.2019

Bachelor's Degree in Computer Science -

Uttara Institute of Business and Technology

Rupak Roy

Summary

Overview

Work History

Sr. Data Engineer

Data Engineer

Data Engineer

Education

Bachelor's Degree in Computer Science -

Skills

Languages

References

Websites

Languages

Timeline

Sr. Data Engineer

Data Engineer

Data Engineer

Bachelor's Degree in Computer Science -

Similar Profiles

Venkata BalijepalliVenkata Balijepalli

Nithin KumarNithin Kumar

Pradeep ReddyPradeep Reddy

Dushyant SinghDushyant Singh

Praful SaxenaPraful Saxena