Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Certification
Languages
Timeline
Generic

Soumyadeep De

Malvern,PA

Summary

Accomplished Data Engineer with over 21 years of global consulting experience in Banking, Telecom, Public Sector, Retail, Manufacturing, Healthcare and Education. Expertise includes designing analytics, complex data extraction, and processing batch and streaming data, managing Data Warehousing Lifecycle (SDLC) projects. Proficient in AWS, Azure and Snowflake, successfully leading data engineering projects for Fortune 500 clients. Recognized for leadership capabilities and delivering innovative solutions that build trusted client relationships.

Overview

21
21
years of professional experience
1
1
Certification

Work History

Sr. Cloud Architect

Computer Aid Inc.
03.2020 - Current


  • Implemented robust data pipelines (Synapse Pipelines) to ingest, transform & move data from diverse sources into Synapse,
    orchestrating complex workflows for ETL/ELT processes
  • Generated denormalized & flattened dimensional model of the analytical schema from highly normalized (3NF) transactional
    sources
  • Monitored, troubleshooted and optimized Synapse workloads (SQL queries, Spark jobs, pipelines) for speed, cost and
    resource utilization, using techniques like workload management, cluster configuration and data skew prevention
  • Developed solutions using SQL (T-SQL, Spark SQL) for data exploration, reporting (Power BI integration) & advanced
    analytics/ML models (PySpark notebooks) within the single Synapse Studio environment
  • Build robust, scalable & reusable data pipelines using various ADF activities (Copy, Data Flow, Lookup, ForEach) and control
    flow logic (If Condition, Switch) to move and transform data from diverse sources to destinations like Azure Data Lake
    Storage (ADLS), Azure SQL Database, or Synapse.
  • Implement Continuous Integration/Continuous Deployment (CI/CD) for ADF pipelines by integrating with Azure DevOps,
    using Git for version control & deploying pipelines as ARM (Azure Resource Manager) templates across different
    environments (Dev, Test, Prod)
  • Adopted security best practices, manage access control with Azure Active Directory (Azure AD) and Role-Based Access
    Control (RBAC), utilize Azure Key Vault for secrets & ensure compliance for data at rest and in transit.
  • Seamlessly connected ADF with other Azure services like Azure Blob Storage, Azure SQL DB, Azure Synapse Analytics &
    Power BI, managing Linked Services and Datasets for comprehensive data solutions.
  • A critical duty involved optimizing Spark jobs and data processing workflows to ensure efficiency, scalability & costeffectiveness.
    This includes managing Azure Databricks compute clusters (configuring auto-scaling, spot instances),
    implementing data skipping with techniques like Z-ordering & leveraging Delta Lake features for efficient data handling.
  • Azure Databricks developers work with cross-functional teams (data scientists, data analysts, etc.) and integrate data solutions
    with other key Azure services, such as Azure Synapse Analytics for data warehousing, Azure Machine Learning for ML
    model deployment (using MLflow) & Power BI for data visualization and reporting.
  • Using Polybase which is an import/export tool that helps to mount data in external tables sourcing data from HDFS or Azure
    Data Lake Store, traditional ETL processes were bypassed completely saving critical project resources.
  • Connected HDInsight with other Azure services such as Azure Data Lake Storage (ADLS Gen2), Azure Synapse Analytics,
    Azure Cosmos DB & Power BI to create end-to-end analytics solutions
  • Utilizing the Azure Stream Analytics SQL query language implemented complex event processing (CEP) with windowed
    aggregates (e.g., Tumbling, Hopping, Sliding & Session windows), temporal JOIN operations using DATEDIFF & temporal
    analytic functions (ISFIRST, LAST, LAG).
  • Utilized native Airflow Operators and Sensors (e.g., for Azure Data Factory, Azure Databricks, Azure Blob Storage, Azure
    SQL Database) to manage task dependencies and flow of information between disparate systems.
  • Monitored job performance using Azure Monitor metrics and adjusting the number of allocated Streaming Units (SUs) to
    ensure the job can handle the data volume and complexity of the stateful queries, especially those with high cardinality within
    windows.

Data Engineer

Hexaware Technologies
08.2018 - 03.2020
  • Designed & enhanced batch and streaming data pipelines on AWS using Glue jobs, Glue Workflows & Python-based ETL
    for diverse source systems.
  • Implemented robust ELT patterns that land raw data in S3 and transform it into curated layers optimized for Redshift and Athena
    consumption.
  • Optimized Redshift data warehouses using dimensional models, sort keys, dist keys & compression to support
    BI and advanced analytics workloads.
  • Monitored and tuned Redshift performance (query plans, WLM/Workload Manager, concurrency scaling) while controlling
    storage and compute costs
  • Architected and managed S3-based data lakes with partitioned, columnar formats (Parquet/ORC) and Glue Data Catalog tables
    to enable cost-efficient Athena querying.
  • Improved Athena query performance and cost through partition pruning, schema evolution strategies & table design aligned
    to access patterns.
  • Developed reusable Python libraries and Spark/PySpark jobs (on Glue or EMR) for complex data transformations, validations
    & business rule implementations.
  • Automated orchestration, error handling & notifications for data workflows using Python, Step Functions & event-driven
    patterns
  • Implemented data quality checks, reconciliation & anomaly detection on pipelines, with results surfaced via logs/metrics and
    integrated into CI/CD
  • Build CI/CD pipelines for data workflows and schemas to promote changes safely across dev, test & prod environments with
    automated validation gates over AWS CodeCommit/GitHub, CodeBuild, CodeDeploy, CodePipeline, CodeArtifact & CodeGuru
  • Technical reviews, data validation & end to end testing of ETL objects, source data analysis and data profiling
  • Debugged to troubleshoot reported bugs of the existing ETL code using both Informatica and PL/SQL technologies

Managing Consultant

IBM CORP
06.2013 - 08.2018
  • Demonstrated capability of Visualization Tool like Tableau and ability to deployed using AWS CI/CD technique
  • Captured requirements in high level functional document and translated them into technical design document
  • Migrated existing ETL logic jobs to be migrated to Informatica/ODI/SSIS
  • Assimilate HR Data from EBS ERP Sources using AWS Glue & loaded data into AWS Redshift
  • Extract Student Data from its SQL Server sources using SSIS and matched its existing representation from SSRS data model
    to build the same report with Tableau
  • Design a dimensional model to associate student data to their individual schools and create visualizations enabling the School
    Principals and Teachers to view the student demographics
  • Created insightful dashboards with highly interactive capability allowing Chief of Schools as well as the reporting ILEDs to
    drill down on individual Principals & Assistant Principals across different districts and identify the most suitable candidate
    for taking up a vacant position for a given school.
  • Worked on a PoC for analyzing ERP data over AWS Infrastructure, extracted and transformed data using AWS Glue over
    Redshift Spectrum serverless applications.
  • Demonstrated problem solving capabilities using Big Data technologies: Hadoop & Apache Spark
  • Liaised with clients and development teams to understand the needs and promoted recommend solutions.
  • Consistently surpassed performance targets while ensuring high-quality workmanship.
  • Identified new business opportunities, driving significant revenue growth for the organization.

Solution Architect

Cognizant Technology Solutions
08.2012 - 05.2013
  • Upgrade the OBIEE environment from current 11.1.1.5.0 to 11.1.1.7.0 version (which was the latest version as of Apr 2013)
  • Provide documentation on the challenges and supporting screenshots while proceeding with the upgrade steps.
  • Upgraded the Pre-Dev windows server first and then upgraded the Sun Solaris environment as well.
  • Raised issues related to the bugs as reported with the newer version and indulged in calls with Oracle to get the work around
    fixes and solutions.
  • Used Opatch to upgrade to 11.1.1.7.1 version as per Oracle released patch documentation
  • Improved system performance via rigorous application testing and continuous integration practices.
  • Contributed technical expertise in project planning, guiding technology selection and strategies.

Package Solution Consultant

IBM India Pvt Ltd
06.2009 - 08.2012
  • Established cross-functional relationships and directed all requirements gathering, data analysis and documentation for
    enterprise projects to centralize data into a multi-function datawarehouse system with advance reporting processes.
  • Assisted in developing business intelligence strategies in coordination with data architect.
  • Managed project management with scope of work, cost estimate, budget, work force, resource management &
    deliverance.
  • Managed full life cycle development of business intelligence systems for company.
  • Reviewed column mapping from EBS ERP sources to the targets in OBIEE datamart, thereby addressing the
    transformation of the data as it got de-normalized into warehousing tables
  • Coordinated with users and developers for all data model techniques
  • Identified areas for improvement through analysis of system performance metrics.
  • Collaborated with product teams to ensure aligned packaging solutions for new offerings.

Analytics Developer

TATA Consultancy Services
09.2004 - 02.2009
  • Analyzed project requirements to identify key system enhancements and improvements.
  • Monitored team performance, driving accountability and meeting project milestones.
  • Developed a Custom Model and all the three layers (Physical, BMM and Presentation) in OBIEE RPD incorporating the
    new stars
    • Migrated Informatica Mappings and Workflows from Development to Integration environment
    • Validate the integrity of new data models and apply OBIEE features to fit detailed business requirements.
    • Develop functional use case requirements to match existing OBIA/Siebel Analytics capabilities.
    • Worked with the subject area developers & ETL group to refine the OBIEE/RPD to meet the reporting requirements of
    the business analysts.

Education

Master of Engineering - Information Technology

Jadavpur University
Kolkata, IND
12-2008

Bachelor of Engineering - Electronics & Telecommunication

University of Kalyani
Kolkata, IND
07-2004

Skills

  • Power BI
  • Azure DataFactory
  • Azure Databricks
  • Microsoft Fabric
  • Azure CosmosDB
  • PolyBase
  • SSIS (SQL Server Integration Service)
  • SSRS (SQL Server Reporting Service)
  • Azure Stream Analytics
  • Azure HDInsights
  • Apache Airflow
  • Apache Spark
  • Python
  • Linux
  • DevOps methodologies
  • Multi-cloud management
  • Monitoring and logging
  • Critical thinking
  • Excellent communication

Accomplishments

  • History of leading critical IT workstreams on Business Intelligence, ML/AI projects resulting in revenues of up to $10M
  • Led and implemented one of the largest Oracle BI Apps implementation North America project in 2016-2018 at Motorola
  • Built strong center of excellence at IBM CORP for OBIEE, ETL, Interfaces, Data Engineering, DevOps
  • Business Intelligence Architect for a very large CAI account with 100+ global practitioners in a demanding client situation
  • Recognized, Promoted for Performance in 2005, 2011, 2016, 2024

Certification

  • Azure Data Engineer – Associate (2024,2023,2022,2020)
  • AWS Solutions Architect – Associate (2020)
  • Hadoop HDFS (2020)
  • Oracle Certified Associate SQL/PLSQL (2007)
  • SQL Server – Integration Services (2021)
  • Oracle Certified Specialist – BI Apps CRM (2013)
  • Oracle Certified Specialist – BI Apps ERP (2013)
  • Oracle Certified Specialist – OBIEE11g (2013)
  • PySpark (2020)
  • Oracle BIEE Cloud Service Specialist (2016)

Languages

English
Full Professional
Spanish
Limited Working
Bengali
Native or Bilingual

Timeline

Sr. Cloud Architect

Computer Aid Inc.
03.2020 - Current

Data Engineer

Hexaware Technologies
08.2018 - 03.2020

Managing Consultant

IBM CORP
06.2013 - 08.2018

Solution Architect

Cognizant Technology Solutions
08.2012 - 05.2013

Package Solution Consultant

IBM India Pvt Ltd
06.2009 - 08.2012

Analytics Developer

TATA Consultancy Services
09.2004 - 02.2009

Master of Engineering - Information Technology

Jadavpur University

Bachelor of Engineering - Electronics & Telecommunication

University of Kalyani