Experienced data engineering professional with a proven track record of developing and managing efficient data systems. Demonstrated ability to deliver impactful solutions through collaborative and results-driven approaches. Recognized for expertise in data warehousing and ETL processes, as well as the flexibility to adapt to evolving project requirements. Dedicated to delivering impactful results, I bring extensive experience in designing and implementing scalable data architectures, optimizing data pipelines, and leveraging big data technologies. Collaborative team leadership, adaptability, and a results-driven approach have been key to my success. Proficient in SQL, Python, Spark, and cloud platforms, I possess a keen ability to align technical solutions with business objectives.
Overview
10
10
years of professional experience
Work History
Lead Data Engineer
ZYTER|Trucare
02.2024 - Current
Managed large-scale data storage and processing using Amazon S3 and HDFS, ensuring efficient data availability, fault tolerance, and integration with data lakes and warehouses
Designed and developed scalable data pipelines for business intelligence (BI) and analytics, integrating SnapLogic and Azure Data Factory (ADF) for optimized ETL/ELT processes
Automated workflows for seamless data extraction, transformation, and loading into data warehouses for real-time insights
Proficient in building data models and optimizing SQL queries for BI tools like Tableau, Power BI, and Looker, enabling data-driven decision-making
Created interactive dashboards and visualizations to present actionable insights across business units
Integrated Generative AI solutions into production systems for automated content generation, enhancing marketing automation and content creation workflows
Optimized data storage strategies for unstructured data (e.g., text, images) used in training AI models
Optimized rendering performance in ReactJS by implementing lazy loading, memorization, and should Component Update to enhance the overall user experience
Configured and optimized NiFi processors for seamless integration with various data sources such as databases, APIs, and file systems
Migrated on-premises PostgreSQL databases to AlloyDB, ensuring high availability, scalability, and improved performance for data-intensive applications and workloads
Designed and implemented scalable data architectures in Google BigQuery to support large-scale analytics and reporting across diverse data sources and business units
Designed and built automated ETL pipelines for data acquisition from diverse sources (APIs, databases, third-party vendors, flat files), ensuring reliable and timely data delivery for BI purposes
Designed, developed, and deployed scalable data pipelines using GCP services such as BigQuery, Dataflow, Cloud Storage, and Cloud Pub/Sub to enable real-time and batch data processing
Managed task dependencies and scheduling within DAGs to ensure proper execution order, minimizing delays and optimizing resource usage
Optimized data pipelines by leveraging BigQuery's partitioning, clustering, and query optimization techniques to reduce processing time and cost for large datasets
Developed and managed end-to-end data pipelines in Azure Databricks using PySpark and Scala, ensuring efficient data processing and transformation for large-scale datasets
Automated ETL workflows using Azure Data Factory and AWS Glue, integrating Spark Streaming and Apache Kafka for real-time data processing
Implemented data integrity checks and quality assurance measures to ensure accuracy across datasets
Utilized business intelligence tools like Looker, Tableau, and Power BI to design intuitive, high-performance dashboards and reporting solutions, providing actionable insights to business stakeholders
Built and managed data warehouses in BigQuery, optimizing schema design, partitioning, and clustering for high-performance querying and cost efficiency across large datasets
Leveraged DAGs to automate data extraction, transformation, and loading (ETL) processes, significantly reducing manual intervention and improving process efficiency
Integrated RESTful APIs and GraphQL endpoints with ReactJS to enable seamless data flow between the front end and backend services
Developed and orchestrated data workflows using Cloud Composer (Airflow), automating ETL processes and ensuring timely, reliable data movement between systems in the GCP ecosystem
Implemented NiFi's data provenance features to track data lineage, enabling enhanced visibility and traceability of data across complex workflows
Integrated AlloyDB with GCP BigQuery and other cloud services to enable hybrid data processing solutions, combining real-time transactional data with large-scale analytics
Collaborated with data analysts and business stakeholders to define key metrics and KPIs, creating data models and BI reports that support data-driven decision-making across the organization
Applied advanced data transformation techniques, including text cleaning, tokenization, and image preprocessing, to prepare datasets for Generative AI model training
Developed complex SQL queries and optimized database performance through indexing and restructuring
Troubleshot and resolved production issues related to data ingestion, transformation, and integration, ensuring timely incident resolution
Developed custom processors and controller services in Java to extend NiFi's out-of-the-box capabilities, addressing unique business requirements
Developed and managed BigQuery ETL processes, integrating data from Google Cloud Storage, Cloud Pub/Sub, and other sources to deliver high-quality, real-time data insights
Designed and implemented machine learning models using Azure Machine Learning Studio, ensuring efficient model training, evaluation, and deployment pipelines
Collaborated with backend teams to integrate ReactJS with Node.js/Express APIs, ensuring efficient data handling and state management
Leveraged AlloyDB's advanced indexing and query optimization features to ensure efficient data retrieval and processing, reducing query latency for mission-critical applications
Integrated Azure Synapse Analytics, AWS Glue, and Apache Airflow for seamless orchestration of data flows and transformation tasks
Leveraged Microsoft Fabric for scalable data integration and real-time analytics solutions
Managed data acquisition and transformation processes, ensuring data quality, integrity, and consistency before feeding into business intelligence platforms for analysis and reporting
Implemented proactive monitoring and alerting mechanisms using tools such as Grafana, Prometheus, and ELK stack to detect potential system failures
Collaborated with UX/UI designers to translate wireframes and mockups into responsive, high-performance web interfaces using ReactJS and CSS frameworks like Bootstrap and Material-UI
Collaborated with cross-functional teams to build and deploy machine learning models in Azure Databricks, utilizing its integration with Azure Machine Learning for scalable and optimized training
Designed and implemented star and snowflake schema for optimized reporting and analytics
Utilized Snowflake and AWS Athena for optimized querying and data storage
Managed performance tuning, indexing, and schema designs for enterprise applications
Collaborated with data teams to define DAG structures that ensure clear task execution order, error handling, and logging to support complex data pipeline requirements
Designed and deployed SSIS packages to extract, transform, and load data from multiple sources into centralized data warehouses, ensuring seamless data integration
Automated data migration, cleansing, and scheduled updates using SSIS, reducing manual effort by 80% and improving operational efficiency
Executed real-time data streaming solutions with Apache Kafka and Spark Streaming, facilitating immediate analytics and decision-making
Optimized data ingestion and processing workflows using Microsoft Fabric tools
Led efforts in managing data security and compliance by leveraging NiFi's built-in data encryption, access control, and authentication mechanisms
Built and maintained data lakes by integrating Azure Databricks with Azure Data Lake Storage, enabling efficient data storage and real-time analytics for business intelligence
Automated model deployment and monitoring using Azure ML pipelines, streamlining the end-to-end workflow from data preparation to production
Built and maintained scalable data warehouses in Snowflake and Amazon Redshift, optimizing storage configurations, partitioning, and query performance for large datasets
Developed and executed complex queries to extract actionable business insights
Developed predictive models and analytics dashboards using Alteryx and Power BI, enabling actionable insights for strategic decision-making
Integrated external data sources to enhance model accuracy and reporting capabilities
Collaborated with cross-functional teams to perform root cause analysis (RCA) and implement preventive measures to avoid recurring issues in production
Automated data ingestion workflows using Google Cloud Dataflow and Cloud Composer, ensuring seamless and reliable data movement into BigQuery for downstream analytics
Configured and managed cloud resources on AWS, Azure, and GCP for optimized computing and storage, including EC2, Synapse Analytics, Hadoop, and RDS
Ensured data security, scalability, and fault tolerance across cloud environments
Implemented RESTful API services for efficient data interchange, enabling system integration and data accessibility
Developed custom reporting solutions using SQL and DAX to support cross-departmental decision-making
Designed and implemented high-availability and disaster recovery strategies using AlloyDB's multi-zone deployment capabilities, ensuring business continuity and minimal downtime
Leveraged Azure Machine Learning's automated machine learning (AutoML) feature to accelerate model development and improve model accuracy for business-critical applications
Contributed to data governance and metadata management using tools like AWS Athena and Apache NiFi, ensuring compliance with data policies and ensuring high data integrity
Built infrastructure for real-time content generation using Generative AI models, supporting low-latency applications such as chatbots and personalized recommendations
Automated deployment and model monitoring using Scikit-learn
Integrated GCP BigQuery with external tools and services such as Google Analytics, Firebase, and Google Cloud AI, providing a unified solution for data analysis and business intelligence
Automated deployment processes using Jenkins, enhancing CI/CD pipelines and streamlining code deployment to AWS and Azure environments
Optimized ETL processes and scheduled automated pipeline execution using ADF triggers
Integrated Azure ML with Azure Databricks for distributed training and high-performance machine learning on large-scale datasets
Implemented security and data governance policies within GCP, using IAM (Identity and Access Management), Cloud Security Command Center, and VPC for secure and compliant data handling
Used Webpack and Babel for building and bundling ReactJS applications, ensuring compatibility across browsers and reducing JavaScript bundle sizes
Led data migration projects for seamless extraction, transformation, and loading (ETL) between on-premise systems and cloud platforms
Integrated Azure Data Factory pipelines with Snowflake and Amazon RDS for consistent data orchestration
Leveraged Tableau, Power BI, and Looker to create visually compelling analytics dashboards, driving data-driven decisions
Implemented advanced analytics techniques for real-time performance monitoring and business insight generation
Architected and maintained secure data lakes using AWS technologies, including EMR, Lambda, and Redshift, ensuring robust and compliant storage solutions for large-scale data sets
Designed systems for efficient querying and data exchange in big data environments
Environment: AWS, Azure, Snowflake, Redshift, Databricks, Apache Spark, Hadoop, Tableau, Power BI, Looker, SQL, PL/SQL, Python, Generative AI, SnapLogic, Azure Data Factory (ADF), HDFS, S3, Lambda, Kafka, Cassandra, Kubernetes, Elasticsearch, Data Governance, Cloud Migration, Data Lakes, Real-Time Analytics, Machine Learning, API Development, Data Integration, Apache Airflow, and more.
Cloud Data Engineer
HSBC
10.2022 - 09.2023
Developed scalable ETL pipelines using Dataflow, Snowflake, Databricks, PySpark on Azure Databricks to enrich data stored in Azure Data Lake
Developed scalable ETL pipelines leveraging Azure Databricks, Snowflake, PySpark, and Azure Data Factory to ingest, transform, and enrich data stored in Azure Data Lake
Created data ingestion pipelines using Spark SQL and integrated with Cosmos DB for seamless data flow
Designed and implemented real-time data processing workflows with SnapLogic Ultra Pipelines and Azure, ensuring low-latency delivery for critical business operations
Integrated SnapLogic with multiple cloud platforms (Azure, AWS, GCP) for data processing across environments.Built efficient data transformation workflows to cleanse, enrich, and prepare datasets for analytics and reporting
Utilized tools like SnapLogic, ADF, and Alteryx for dynamic workflows, ensuring data quality and consistency
Automated data quality checks to ensure accuracy in SQL Server and cloud databases
Led migration efforts from legacy systems to modern cloud data architecture, using Azure Data Factory, GCP, Databricks, and Snowflake to integrate, store, and process large datasets
Migrated legacy on-premises data warehouses to Snowflake, ensuring minimal downtime and data integrity
These points highlight your expertise in ReactJS and demonstrate your ability to develop performant, scalable, and user-friendly web applications
Built custom SQL queries and BigQuery scripts to extract, transform, and load (ETL) data, enabling business stakeholders to generate actionable insights from large datasets
Integrated Cloud-based BI solutions with on-premise databases and cloud storage, ensuring seamless data flow between acquisition, processing, and presentation layers
Worked with development teams to implement AlloyDB's automated backups, ensuring that data is securely backed up and available for recovery in case of failure
Leveraged GCP Pub/Sub and Dataflow to build real-time data streaming pipelines, enabling immediate insights and automated decision-making across data sources
Integrated DAGs with external systems and APIs to fetch, process, and store data, ensuring seamless data flow across various cloud and on-premise environments
Integrated data into BI platforms such as Power BI, Tableau, and SQL Server, enabling advanced analytics and reporting
Built and maintained data marts to improve the performance and usability of BI solutions
Designed and executed real-time dashboards and CX scorecards using Qualtrics for actionable insights
Automated deployment and scaling of AlloyDB instances using Terraform and Google Cloud Deployment Manager, streamlining database provisioning and configuration in cloud environments
Integrated BigQuery with Google Data Studio, Looker, and other BI tools to deliver real-time dashboards and visualizations that support data-driven decision-making
Automated data collection from various sources using APIs, custom scripts, and GCP Dataflow, reducing manual data entry and improving data accuracy and timeliness
Optimized NiFi cluster performance for high availability and fault tolerance, ensuring reliable data processing at scale
Developed and maintained complex single-page applications (SPAs) using ReactJS, ensuring a seamless user experience and optimal performance across browsers
Created interactive notebooks in Azure Databricks for data exploration, transformation, and visualization, enabling stakeholders to access actionable insights with ease
Managed data versioning and change management processes, ensuring smooth updates to production data workflows and minimizing disruptions
Coordinated the deployment and scaling of NiFi in a multi-node environment to support large-scale, real-time data integration projects
Built and maintained data marts and data lakes to centralize and organize large volumes of data, enabling advanced analytics and reporting through business intelligence platforms
Applied machine learning models within Microsoft Fabric for uncovering actionable business insights
Integrated Azure Synapse Analytics and Power BI for advanced data visualization and model performance tracking
Collaborated with application teams to optimize database schema design and indexing strategies in AlloyDB, resulting in improved query performance and system responsiveness
Implemented dynamic DAG generation based on external configurations or parameters, allowing for reusable and flexible data pipelines that can scale with business needs
Integrated NiFi with other big data technologies, including Hadoop, Spark, and Kafka, to support end-to-end data pipelines for analytics and reporting
Automated infrastructure provisioning using Google Cloud Deployment Manager and Terraform, streamlining resource management and ensuring reproducible environments for data engineering tasks
Implemented component-based architecture using ReactJS to create modular, reusable UI components that adhere to DRY (Don't Repeat Yourself) principles
Implemented batch and stream processing workflows using Azure Databricks and Structured Streaming to handle both real-time and historical data processing needs
Developed efficient data aggregation and transformation logic to support complex queries and reporting, optimizing the performance of BI tools and dashboards
Collaborated with cross-functional teams to create data-driven solutions by applying predictive modeling, clustering, and classification techniques in Azure ML
Leveraged ReactJS hooks such as useState, useEffect, and useReducer to manage component state and side effects, streamlining code complexity and enhancing maintainability
Implemented robust data governance and security protocols across SnapLogic, Snowflake, and Azure to ensure compliance with organizational standards
Managed role-based access control (RBAC) in Snowflake and utilized Azure Key Vault for enhanced security
Implemented complex data transformations and robust error-handling mechanisms, improving data accuracy and reducing processing errors by 95%
Enhanced SSIS packages to handle large datasets (1TB+ daily), reducing processing time by 50% through parallel execution and resource optimization
Applied BigQuery ML to build and deploy machine learning models directly within the data warehouse, enabling predictive analytics and advanced data insights without data movement
Provided technical guidance and best practices on NiFi's use, helping cross-functional teams improve data flow efficiency and reduce time to insight
Utilized Airflow's native features such as task retries, XCom, and sensors within DAGs to handle failures, manage task state, and create robust, fault-tolerant data workflows
Developed and maintained CI/CD pipelines using GitHub Actions, Azure DevOps, and DBT for streamlined code deployment, version control, and workflow automation
Automated API integrations with SnapLogic's API Management to ensure seamless data exchanges and integrations
Utilized GCP's BigQuery, Compute Engine, and Kubernetes for scalable data processing and application deployment
Configured Google Cloud Storage and Cloud SQL for optimized data storage and migration
Implemented Cloud Load Balancing and Cloud CDN for efficient networking during migration
Ensured compliance with data governance frameworks and security policies during the data acquisition process, maintaining proper data access controls and audit trails for sensitive business data
Managed and monitored machine learning experiments through Azure ML's tracking capabilities, ensuring high-quality models and continuous improvement
Maintained and optimized databases in production environments, including performance tuning, index optimization, and data consistency checks
Monitored and optimized SQL Server and Snowflake databases for improved performance, query execution times, and storage efficiency
Employed tools such as SQL Trace, TKPROF, and AWR reports to identify and resolve performance bottlenecks
Integrated third-party libraries and frameworks with ReactJS, such as Chart.js, D3.js, and Redux-Saga, to implement data visualizations and handle complex state management logic
Implemented model versioning and management using Azure Machine Learning's model registry, ensuring smooth version control and governance for machine learning models
Proficient in using PySpark, Databricks, and Talend for data transformation and migration tasks
Optimized Spark Streaming for real-time data processing from sources like Apache Flume
Collaborated with cross-functional teams to design scalable data models, ensuring data consistency, security, and compliance across cloud platforms and databases
Facilitated the design of KPI dashboards and CX programs to track customer feedback and performance
Led successful cloud migrations from on-premise systems to cloud-based solutions, ensuring data integrity and cost optimization
Implemented data-loading strategies for Data Lakes and Data Warehouses using ADF and Snowflake for seamless and automated data ingestion
Designed and deployed real-time dashboards and KPI reports using Qualtrics to provide insights into customer feedback trends
Configured Power BI for interactive data visualizations that drive business decisions
Developed data preprocessing pipelines within Azure ML for large-scale datasets, integrating with Azure Data Lake, Azure SQL Database, and other data storage solutions
Applied best practices for data backup, disaster recovery, and data security to ensure the integrity and confidentiality of production data
Conducted extensive SQL tuning and optimization tasks, including query restructuring, indexing, and partitioning to improve the overall performance of SQL Server and Snowflake queries
Managed and optimized API integrations using SnapLogic, reducing operational costs and enhancing scalability
Automated data movement and ETL processes for seamless integration between enterprise systems and cloud platforms
Used Azure Monitor and Log Analytics for pipeline health monitoring, ensuring proactive error resolution and performance optimization
Led debugging and optimization efforts to ensure high availability and reliability of ETL processes
Committed to ongoing professional development, staying up-to-date with advancements in Snowflake, Azure, GCP, and emerging data engineering technologies
Applied domain-specific knowledge to enhance the impact of engineering solutions in the Finance and Banking sectors
Environment: Snowflake, Databricks, Azure Data Factory (ADF), Azure Synapse Analytics, GCP, Power BI, PySpark, SQL, Data Lakes, ETL, SnapLogic, Machine Learning, Alteryx, Spark Scala, Data Governance, CI/CD, DBT, Cosmos DB, AWS, API Integration, Python (NumPy, Pandas), Kubernetes, Talend, SQL Server.
Sr. Data Engineer & Python Developer
Tata Consultancy Services
09.2020 - 04.2022
Implemented polybasic technique for data loading and exporting in Azure Synapse Analytics using serverless SQL pools and Spark pools
Migrated from Oozie to Apache Airflow for incremental loads, Hadoop, extracting data from RDBMS
Implemented Kafka high-level consumers for obtaining data from Kafka partitions and moving it into HDFS
Managed resources and scheduling on Azure Kubernetes Service for handling online and batch workloads
Leveraged Azure DevOps and VSTS for CI/CD, Databricks, Azure Synapse Analytics, DBT, Dataflow, Data Warehouse, utilized Active Directory for authentication, and employed Apache Ranger for authorization
Supported incident response and post-mortem processes, documenting incidents, resolutions, and steps to improve system reliability
Ensured compliance and security within AlloyDB by implementing encryption, access controls, and audit logs to meet data privacy and regulatory requirements
Collaborated with cross-functional teams to define data schemas, design data models, and optimize queries for performance in BigQuery, enhancing data accessibility and usability
Monitored and fine-tuned DAG performance through execution metrics, logs, and visualizations, ensuring optimal performance and identifying bottlenecks in data processing
Used Scala for concurrency support, developing map-reduce jobs for JVM-based data processing
Executed SQL queries and published data for interactive Power BI dashboards and reporting
Implemented Dimensional Data Modeling for Multi-Dimensional STAR schemas and Snowflake Schemas
Utilized Google Cloud Functions for serverless compute and Cloud Run for deploying microservices, improving scalability and reducing operational overhead in data-driven applications
Created and managed forms using ReactJS controlled components, validation libraries like Formik and Yup, and custom input components to collect and process user data efficiently
Assisted in automation of manual tasks and the implementation of CI/CD pipelines to streamline deployments and reduce human error in production environments
Ensured data pipeline scalability by structuring DAGs to handle high-volume datasets, utilizing parallel processing and resource optimization techniques to meet growing data demands
Collaborated with cross-functional teams to understand business requirements and develop SnapLogic solutions tailored to meet specific objectives
Utilized AlloyDB's compatibility with PostgreSQL to support complex SQL queries, foreign key relationships, and other advanced database features for transactional workloads
Managed access control and data security in BigQuery by implementing fine-grained IAM roles, encryption, and data auditing to ensure compliance with governance standards
Managed and optimized GCP Cloud Storage for large-scale data storage, ensuring efficient data retrieval, security, and cost-effective use of resources
Worked with WebSockets to implement real-time updates in ReactJS applications, allowing users to see live data without refreshing the page
Conducted performance tuning of pipelines to maximize efficiency, reducing runtime and resource consumption
Integrated SnapLogic with downstream applications, such as Tableau, Power BI, and other BI tools, to enable data-driven decision-making
Leveraged BigQuery's federated queries to connect with external data sources (e.g., Google Sheets, Cloud SQL) for cross-platform data analysis
Provided technical training and documentation on SnapLogic solutions to ensure team-wide adoption and knowledge sharing
Collaborated with cross-functional teams to design GCP-based solutions for big data analytics, machine learning, and data visualization, enhancing the organization's data-driven decision-making
Optimized ReactJS components by employing techniques such as memoization and PureComponent to reduce unnecessary re-renders and improve app performance
Created test cases and validation processes to ensure the accuracy and consistency of data pipelines
Monitored and maintained SnapLogic environments, ensuring optimal performance and timely updates to Snaplex nodes
Collaborated with the backend team to design and integrate APIs with ReactJS, using tools like Axios for HTTP requests, and ensuring efficient data fetching strategies with caching mechanisms
Conducted unit testing, system integration testing, and user acceptance testing (UAT) for database components, ensuring smooth deployment across environments
Enforced database security measures, including roles, privileges, and data masking, to safeguard sensitive data
Authored comprehensive technical documentation, including database schemas, PL/SQL code standards, and deployment guides, facilitating team collaboration and knowledge sharing
Designed and implemented Azure Subscriptions, data factories, Kubernetes, Virtual Machines, SQL Azure Instances, HD Insight clusters
Utilized Spark Data Frames in Azure Databricks for business transformations and data cleansing
Developed Python scripts for ETL pipelines and DBT, DAG workflows in Airflow and Apache NiFi
Designed custom input adapters using Spark and Hive to ingest and analyze data in Airflow, ingesting into Snowflake
Developed a scalable web application using Django, integrating complex business logic and user authentication
Led the migration of a legacy system to a Python-based platform, enhancing efficiency and scalability
Implemented a data processing pipeline using Python, DBT, Pandas, and NumPy, resulting in a 30% reduction in processing time
Designed and built RESTful APIs in Flask, facilitating seamless integration with front-end applications and third-party services
Engineered an automated testing framework in Azure Synapse Analytics, Python, improving code quality and reducing bug rates by 25%
Managed database schema design and integration with Hadoop, Dataflow, PostgreSQL and MongoDB, optimizing data retrieval and storage
Contributed to an open-source project in Python, adding features and fixing bugs, demonstrating community engagement and collaboration skills
Implemented real-time data visualization tools using Python libraries like Matplotlib and Seaborn, aiding in insightful business decision-making
Spearheaded a cross-functional team in an Agile environment, successfully delivering complex project milestones on schedule
Continuously refactored and optimized existing Python codebase, ensuring high performance and adherence to modern coding standards
Designed and implemented a robust database schema for a Hospital Management System (HMS) using SQL Server
Designed and implemented a robust SQL Server schema for a Hospital Management System (HMS), developing complex queries, stored procedures, and functions to enable efficient data retrieval and manipulation
Integrated SQL-based reporting tools for dynamic generation of patient and hospital operation reports
Developed and maintained ETL pipelines using Matillion for cloud data warehouses (AWS Redshift, Hadoop, Google BigQuery, DBT), optimizing data ingestion, transformation, and loading processes
Integrated data from multiple sources, improving accessibility and data accuracy
Collaborated with healthcare professionals to design clinical data solutions, creating an SQL-driven data warehouse for the HMS to facilitate advanced analytics and decision-making
Implemented database security measures to ensure patient data privacy and regulatory compliance
Built and maintained a POS system database to support retail operations, including inventory management and payment processing
Designed SQL-based solutions for real-time tracking of stock levels and sales analytics
Leveraged DBT for sales and customer data analytics, improving business intelligence capabilities
Utilized Python and Pandas for data cleaning, feature scaling, and enrichment to enhance reporting accuracy
Integrated customer data from various platforms (e.g., web, mobile, CRM) into Customer Data Platforms (CDPs) and Data Management Platforms (DMPs), creating a 360-degree view of the customer for targeted marketing and segmentation
Utilized PySpark for large-scale data transformations and Kafka for real-time streaming data analysis
Built data pipelines to process, merge, and enrich data, ensuring efficient ETL workflows
Developed data processing pipelines in AWS and containerized web analytics applications and pipelines using Docker to ensure scalability and portability across different environments
Managed routine backup, recovery, and disaster recovery solutions for both HMS and POS systems, ensuring high availability and operational continuity
Conducted statistical analysis using Python and presented findings to stakeholders
Developed web services using SOAP for XML data exchange and Flask for data manipulation and retrieval
Worked with Jenkins for continuous integration and deployment, utilizing GIT version control for efficient code management and collaboration
Utilized Informatica Power Center for ETL tasks and data integration
Environment: SQL Server, Matillion, AWS Redshift, Hadoop, Google Big Query, DBT, Informatica, PySpark, Kafka, Docker, Flask, Python, Pandas, NumPy, SQL, AWS, Data Warehousing, Data Integration, ETL, Dimensional Data Modelling, Web Services, UNIX, Jenkins, Git.
Education
Master of Science - Data Science
Saint Peter's University
Jersey City, NJ
05.2001 -
Skills
Big Data Technologies: Hadoop, Spark, PySpark, Hive, Kafka, Flume, Sqoop, Oozie, Zookeeper, MapReduce, Cloudera Manager
Work Availability
monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse
Quote
Success is not final; failure is not fatal: It is the courage to continue that counts.