Experienced Sr. Data Engineer with a proven history of over 9+ years, proficiently designing and deploying scalable data ingestion pipelines involving Big Data, AWS, Microsoft Azure Cloud, PySpark, Python and transitioning from On-Premises to Azure Cloud Solutions. Developed and implemented end-to-end data integration solutions using Azure Data Factory, to orchestrate data integration workflows, utilizing linked services, source and sink datasets, pipelines, and activities to extract, transform, and load data from diverse sources into target systems Hands-on experience in Azure Databricks for distributed data processing, transformation, validation, cleansing, and ensuring data quality and integrity. Designed and implemented end-to-end data workflows with Azure Logic Apps, Azure Functions, and server less solutions. Extensive experience of implementing solutions using AWS services like (EC2, S3, and Redshift), Hadoop HDFS architecture and Map-Reduce framework. Worked in AWS environment for development and deployment of custom Hadoop applications. Hands-on experience in Python Boto3 for developing Lambda functions in AWS. Exhibited a high level of competence in utilizing Azure Event Hub to efficiently ingest real-time streaming data. Strong proficiency in utilizing Azure Synapse Pipelines to successfully orchestrate and manage data integration and transformation workflows. Ample hands-on experience with Azure Blob Storage, ensuring efficient storage and retrieval of both unstructured and semi-structured data. Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Data bricks, NoSQL DB), SQL Server, Oracle, Data warehouse etc. Built multiple Data Lakes. Proficient in Python and Scala using Spark framework. Applied knowledge in proficiently navigating through a range of file formats, such as CSV, JSON, Parquet, and Avro, ensuring optimized storage, processing, and data interchange within data engineering pipelines and analytics workflows. Hands-on experience in implementing data pipeline solutions using Hadoop, azure, ADF, Synapse, PySpark, Map-Reduce, Hive, Tez, Python, Scala, Azure functions, Logic apps, stream sets, ADLS Gen2 and Snowflake. Strong background in Data Pipeline Development and Data Modelling. Remarkable proficiency in Kafka streaming technology, skilfully employing its distributed messaging capabilities to construct resilient and high-performing data flows. Design, deploy, and optimize scalable ML workflows on GCP, leveraging services like Vertex AI, BigQuery, and Cloud Functions for seamless integration and automation Design, deploy, and optimize scalable ML workflows on GCP, leveraging services like Vertex AI, BigQuery, and Cloud Functions for seamless integration and automation. Experience in OpenAI technologies, including GPT-4, Vertex AI, Llama, with expertise in prompt design, prompt engineering, and fine-tuning Language Model (LLM) capabilities. Utilized Copilot, GitHub's AI-powered assistant, to streamline code development processes, improve code quality, and enhance collaboration within the data science team. Built codebase for a natural language processing and (AI) artificial intelligence, (ML) machine learning framework. Used C++ STL containers, algorithms in the application. Innovated and leveraged ML, Data Mining and statistical techniques to create new, scalable solutions for business problems. Designed the (AI) artificial intelligence, (ML) Machine learning data pipeline for regular monitoring and performance evaluation of the deployed ML models. Worked with GCS (Cloud Storage) and Pub/Sub to manage streaming and batch data processing for ML pipelines. Developed scalable micro services and REST APIs for model serving using Cloud Run and App Engine. Expertise in utilizing Spark Streaming to design and deploy real-time data pipelines that efficiently process large data volumes from diverse sources. Implemented scheduling Hadoop jobs using Apache Oozie, Importing, and exporting the data using SQOOP from HDFS to Relational Database systems. Optimized Hive and Spark query performance through strategic Bucketing and Partitioning strategies for efficient data retrieval and storage. Analysis was done using Python libraries such as PySpark. Strong experience in working with ELASTIC MAP REDUCE (EMR) and setting up environments on Amazon AWS EC2 instances. Configured and managed Zookeeper to ensure efficient coordination and synchronization of distributed data processing systems. Exhibited the ability to formulate and implement data integration strategies connecting Snowflake with external systems. Utilized technologies like Apache Airflow or custom-built orchestration frameworks to ensure seamless data movement and synchronization. Successfully integrated Snowflake with Azure Data Factory to arrange complex ETL pipelines, significantly optimizing data migration from diverse sources into Azure-based data warehouses. Demonstrated expertise in managing Snowflake's unique features such as Zero-Copy Cloning, Time Travel, and Data Sharing, for efficient data management. Implemented data pipelines using Snow SQL, Snowflake Integrated services, and snow pipe. Executed Hive query performance enhancement through bucketing and partitioning techniques, with extensive hands-on experience in tuning Spark jobs. Implemented SQL Analytical Functions & Window Functions for advanced data analysis. Proficient in utilizing Informatica Cloud for cloud-based data integration and management. Effectively partnered with data analysts and stakeholders to implement data models, structures, and designs in seamless coordination.