Accomplished Senior Data Engineer with 7+ years’ experience in designing, implementing, and optimizing data pipelines and architectures across diverse cloud platforms such as AWS, Azure, and GCP. Proficient in utilizing cutting-edge technologies and frameworks including Apache Spark, Apache Kafka, Apache Flink, Delta Lake, and Databricks for efficient data processing, transformation, and real-time analytics. Adept at leveraging Cloud services like AWS Lambda, Kinesis, Athena, Azure Data Factory, and Azure Databricks to build scalable and robust data solutions. Skilled in dimensional modeling, data warehousing, and BI tools such as Tableau and Power BI. Proven track record in developing and deploying data-driven applications using Python, SQL, Scala, and various data manipulation libraries. Proficient in utilizing Python programming language and its vast array of libraries to create efficient and scalable applications across various domains. Experienced in leveraging popular frameworks such as Django and Flask for web development, along with expertise in data manipulation and analysis using libraries like Pandas, NumPy. Strong experience in agile methodologies, DevOps practices, and containerization with Docker and Kubernetes.
Ally Bank faced the challenge of optimizing its data streaming pipeline to efficiently extract and transform data from diverse sources while ensuring data integrity and security.
As a Sr. Data engineer, I Led the development of a highly efficient data streaming pipeline, integrating Flink pipelines for ingesting streaming data from Kinesis streams and implementing automated data validation using Apache Iceberg.
Responsibilities:
BEHR Corporation, is a leading manufacturer company, faced a significant challenge of ingesting and processing data from various source systems efficiently while ensuring scalability and reliability.
As, an Azure developer I Led the implementation of Azure Data Factory to address these challenges, enabling seamless ingestion and processing of diverse data sources. By integrating Azure Databricks with ADF, complex data transformations were efficiently handled, leveraging PySpark support for optimal data flows.
Responsibilities:
• Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements.
• Designed and developed batch pipelines in Azure using Azure Data Factory for efficient data processing and orchestration
• Created numerous pipelines in Azure using Azure Data Factory to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each Databricks.
• Maintain and provide support for optimal pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
• Integrated Azure Data Lake Storage and Azure SQL Database for storing and processing large volumes of data, implementing data partitioning and indexing strategies to improve query performance and data retrieval speed.
• Implemented Delta Lake architecture in Azure Databricks to ensure ACID transactions, schema enforcement, and versioning capabilities, enhancing data reliability and consistency.
• Integrated Azure Databricks with Apache Kafka for real-time data processing, enabling near-real-time insights and actionable intelligence for business stakeholders.
• Created a containerized solution utilizing Azure Kubernetes Service (AKS), such as a distributed microservices architecture deployed across multiple containers, ensuring seamless management and efficient resource utilization.
• Orchestrated the provisioning and configuration of AKS clusters using Terraform, automating infrastructure deployment and ensuring consistency across environments.
• Proficient in Azure API Management, Created API gateways and published interfaces for streamlined data exchange. Integrated Azure services, SaaS services, RESTful web services, and SOAP.
• Experienced in designing and implementing cloud architectures that prioritize security and compliance standards, leveraging Azure Active Directory.
• Incorporated Python APIs such as web APIs and Platform APIs into Azure Logic Apps and Functions for seamless integration and enhanced functionality.
• Implemented real-time data ingestion using Azure Event Hubs to efficiently manage structured and unstructured data.
• Implemented CI/CD pipelines with Azure DevOps to automate the deployment of containerized applications to AKS clusters, enabling rapid iteration and seamless delivery of features.
• Implemented Hadoop-based data lakes and distributed processing using tools like Apache Hadoop for efficient storage, retrieval, and analysis of big data.
• Experienced in Apache HBase, leveraging its NoSQL database capabilities to store and retrieve structured and semi-structured data at scale within the Hadoop ecosystem.
• Proficient in Apache Hive, demonstrated by designing and optimizing Hive queries to extract insights from large-scale datasets efficiently.
• Developed Hadoop-based analytics pipelines to process customer data, allowing for effective segmentation and personalization.
• Designed and implemented MapReduce algorithms to perform data aggregation, filtering, sorting, and other complex data transformations, enabling scalable and fault-tolerant data processing.
• Integrated MapReduce jobs with Apache HDFS for data storage and retrieval, ensuring seamless data movement and interoperability within the Hadoop ecosystem.
• Wrote scripts in Hive SQL, using python plugin for both spark and presto for creating complex tables with high performance metrics like partitioning, clustering and skewing.
• Transferred existing cron jobs to Oozie for improved job scheduling and orchestration.
• Utilized Hadoop's distributed processing capabilities for real-time fraud detection, analyzing large datasets to identify anomalies and patterns indicative of fraudulent behavior.
• Implemented Hadoop-based data integration solutions to consolidate and analyzed data from multiple channels, enabling a holistic view of customer interactions and shopping patterns.
• Utilized Hadoop-based frameworks to process and analyses social media data, extracting insights that inform marketing strategies and improve brand perception.
• Involved in building database Model, APIs and Views utilizing Python, in order to build an interactive web-based solution.
• Responsible for gathering requirements, system analysis, design, development, testing and deployment.
• Generated Python Django Forms to Record data of users.
• Utilize PyUnit, the Python unit test framework, for all Python applications.
• Rewrite existing application in Python module to deliver certain format of data.
• Developed Python batch processors to consume and produce various feeds.
• Worked with Python ORM Libraries including Django ORM.
• Implemented and optimized trading algorithms, backtest strategies, and analyzed market data.
• Conducted in-depth financial data analysis using Python, leveraging pandas for data manipulation and Matplotlib for visualizations to provide actionable insights.
• Leveraged AWS services such as EC2, EMR, S3, Lambda, API Gateway, and DynamoDB to build serverless architectures and microservices.
• Automated routine financial processes through Python scripts, improving efficiency and accuracy in tasks such as data entry, reporting, and reconciliation.
• Designed and developed user-friendly web interfaces for applications using Python frameworks like Django and Flask, ensuring seamless integration with databases and maintaining high security standards.
• Optimized AWS infrastructure for performance, cost-effectiveness, and scalability, utilizing services like Auto Scaling, Elastic Load Balancing, and AWS Cost Explorer.
• Implemented real-time data processing solutions using Python, optimizing SQL queries and utilizing in-memory databases to meet the demands of time-sensitive applications.
• Expertise in Test and QA automation and Development of test cases for manual tests using Python.
• Developed Python code to build new features for HPE storage devices (Nimble).
• Developed automated test scripts using Python to validate functionality, perform regression testing, and ensure software quality across multiple platforms and browsers.
• Prepared Unit test cases using Python, remote testing and performance testing.
• Developed window-based applications and other GUI components using Django.
• Expertise in fast-paced Agile Methodologies (Scrum), Traditional Software models (Waterfall, Test-Driven Development (TDD)).
• Experience in establishing better design patterns to implement MVC and MVP architecture.
• Developed AWS cloud formation templates and setting up Auto scaling for EC2 instances and involved in the automated provisioning of AWS cloud environment using Jenkins.
• Developed code for handling switches and Networks such as NEXUS DATA CENTRE SWITCH.
• Developed Test cases to automate web-based UI using Selenium.
• Used Jenkins build to deploy for Continuous Integration and Continuous Deployment (CI/CD).