Summary
Overview
Work History
Education
Skills
Timeline
Generic

Priya G

Irving,Texas

Summary

Accomplished Senior Data Engineer with 7+ years’ experience in designing, implementing, and optimizing data pipelines and architectures across diverse cloud platforms such as AWS, Azure, and GCP. Proficient in utilizing cutting-edge technologies and frameworks including Apache Spark, Apache Kafka, Apache Flink, Delta Lake, and Databricks for efficient data processing, transformation, and real-time analytics. Adept at leveraging Cloud services like AWS Lambda, Kinesis, Athena, Azure Data Factory, and Azure Databricks to build scalable and robust data solutions. Skilled in dimensional modeling, data warehousing, and BI tools such as Tableau and Power BI. Proven track record in developing and deploying data-driven applications using Python, SQL, Scala, and various data manipulation libraries. Proficient in utilizing Python programming language and its vast array of libraries to create efficient and scalable applications across various domains. Experienced in leveraging popular frameworks such as Django and Flask for web development, along with expertise in data manipulation and analysis using libraries like Pandas, NumPy. Strong experience in agile methodologies, DevOps practices, and containerization with Docker and Kubernetes.

Overview

8
8
years of professional experience

Work History

Sr. Data Engineer

Ally Bank
05.2023 - Current

Ally Bank faced the challenge of optimizing its data streaming pipeline to efficiently extract and transform data from diverse sources while ensuring data integrity and security.

As a Sr. Data engineer, I Led the development of a highly efficient data streaming pipeline, integrating Flink pipelines for ingesting streaming data from Kinesis streams and implementing automated data validation using Apache Iceberg.

Responsibilities:

  • Worked on developing a highly efficient data streaming pipeline that extracts data from multiple sources and applies the transformations based on the data model requirements.
  • Designed and implemented Flink pipelines to ingest streaming data from Kinesis data streams, applying business logic for transformation and serialization into Apache Iceberg data models. Implemented Kinesis for seamless data streaming from diverse sources to Spark, facilitating the transformation of raw data into structured models within Iceberg.
  • Managed S3 buckets and formulated policies, integrating Glacier S3 for backup purposes and ensuring robust data governance.
  • Engineered API gateways with Lambda authorizers to authenticate data access, enhancing security measures for data storage and retrieval processes.
  • Leveraged Lambda functions with boto3 to seamlessly store data in DynamoDB, optimizing data storage and retrieval efficiency. Established automated data validation and quality checks using Apache Iceberg, ensuring data integrity and consistency across various versions.
  • Utilized Databricks notebooks for executing data transformations with PySpark and scala, conducting aggregations, and performing data cleansing activities to construct structured data models for downstream applications.
  • Integrated Databricks with AWS services including S3, EMR, and Lambda, facilitating seamless interaction with data stored in AWS cloud storage infrastructure.
  • Proficiently utilized Spark SQL, Structured Streaming, and Spark Context in conjunction with Spark-Scala for concurrent data transformations and manipulation tasks.
  • Built Data Frames and RDDs in Spark using Scala, applying transformations and actions to analyses and store data efficiently in Cassandra.
  • Developed shell scripts to validate data files and trigger necessary transformations for data storage in DynamoDB, ensuring data quality and consistency.
  • Implemented alerts using AWS SDKs and CLI tools in AWS CloudWatch to monitor fail overs in the streaming pipeline, enabling prompt identification and resolution of issues.

Azure Developer

BEHR Corporation
12.2022 - 04.2023

BEHR Corporation, is a leading manufacturer company, faced a significant challenge of ingesting and processing data from various source systems efficiently while ensuring scalability and reliability.

As, an Azure developer I Led the implementation of Azure Data Factory to address these challenges, enabling seamless ingestion and processing of diverse data sources. By integrating Azure Databricks with ADF, complex data transformations were efficiently handled, leveraging PySpark support for optimal data flows.

Responsibilities:

• Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements.

• Designed and developed batch pipelines in Azure using Azure Data Factory for efficient data processing and orchestration

• Created numerous pipelines in Azure using Azure Data Factory to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each Databricks.

• Maintain and provide support for optimal pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.

• Integrated Azure Data Lake Storage and Azure SQL Database for storing and processing large volumes of data, implementing data partitioning and indexing strategies to improve query performance and data retrieval speed.

• Implemented Delta Lake architecture in Azure Databricks to ensure ACID transactions, schema enforcement, and versioning capabilities, enhancing data reliability and consistency.

• Integrated Azure Databricks with Apache Kafka for real-time data processing, enabling near-real-time insights and actionable intelligence for business stakeholders.

• Created a containerized solution utilizing Azure Kubernetes Service (AKS), such as a distributed microservices architecture deployed across multiple containers, ensuring seamless management and efficient resource utilization.

• Orchestrated the provisioning and configuration of AKS clusters using Terraform, automating infrastructure deployment and ensuring consistency across environments.

• Proficient in Azure API Management, Created API gateways and published interfaces for streamlined data exchange. Integrated Azure services, SaaS services, RESTful web services, and SOAP.

• Experienced in designing and implementing cloud architectures that prioritize security and compliance standards, leveraging Azure Active Directory.

• Incorporated Python APIs such as web APIs and Platform APIs into Azure Logic Apps and Functions for seamless integration and enhanced functionality.

• Implemented real-time data ingestion using Azure Event Hubs to efficiently manage structured and unstructured data.

• Implemented CI/CD pipelines with Azure DevOps to automate the deployment of containerized applications to AKS clusters, enabling rapid iteration and seamless delivery of features.

Hadoop engineer

Ace Turtle
06.2020 - 07.2022

• Implemented Hadoop-based data lakes and distributed processing using tools like Apache Hadoop for efficient storage, retrieval, and analysis of big data.

• Experienced in Apache HBase, leveraging its NoSQL database capabilities to store and retrieve structured and semi-structured data at scale within the Hadoop ecosystem.

• Proficient in Apache Hive, demonstrated by designing and optimizing Hive queries to extract insights from large-scale datasets efficiently.

• Developed Hadoop-based analytics pipelines to process customer data, allowing for effective segmentation and personalization.

• Designed and implemented MapReduce algorithms to perform data aggregation, filtering, sorting, and other complex data transformations, enabling scalable and fault-tolerant data processing.

• Integrated MapReduce jobs with Apache HDFS for data storage and retrieval, ensuring seamless data movement and interoperability within the Hadoop ecosystem.

• Wrote scripts in Hive SQL, using python plugin for both spark and presto for creating complex tables with high performance metrics like partitioning, clustering and skewing.

• Transferred existing cron jobs to Oozie for improved job scheduling and orchestration.

• Utilized Hadoop's distributed processing capabilities for real-time fraud detection, analyzing large datasets to identify anomalies and patterns indicative of fraudulent behavior.

• Implemented Hadoop-based data integration solutions to consolidate and analyzed data from multiple channels, enabling a holistic view of customer interactions and shopping patterns.

• Utilized Hadoop-based frameworks to process and analyses social media data, extracting insights that inform marketing strategies and improve brand perception.

Python Developer

Eko India Financial Services
11.2018 - 05.2020

• Involved in building database Model, APIs and Views utilizing Python, in order to build an interactive web-based solution.

• Responsible for gathering requirements, system analysis, design, development, testing and deployment.

• Generated Python Django Forms to Record data of users.

• Utilize PyUnit, the Python unit test framework, for all Python applications.

• Rewrite existing application in Python module to deliver certain format of data.

• Developed Python batch processors to consume and produce various feeds.

• Worked with Python ORM Libraries including Django ORM.

• Implemented and optimized trading algorithms, backtest strategies, and analyzed market data.

• Conducted in-depth financial data analysis using Python, leveraging pandas for data manipulation and Matplotlib for visualizations to provide actionable insights.

• Leveraged AWS services such as EC2, EMR, S3, Lambda, API Gateway, and DynamoDB to build serverless architectures and microservices.

• Automated routine financial processes through Python scripts, improving efficiency and accuracy in tasks such as data entry, reporting, and reconciliation.

• Designed and developed user-friendly web interfaces for applications using Python frameworks like Django and Flask, ensuring seamless integration with databases and maintaining high security standards.

• Optimized AWS infrastructure for performance, cost-effectiveness, and scalability, utilizing services like Auto Scaling, Elastic Load Balancing, and AWS Cost Explorer.

• Implemented real-time data processing solutions using Python, optimizing SQL queries and utilizing in-memory databases to meet the demands of time-sensitive applications.

Software Engineer

Hewlett Packard Enterprise
02.2016 - 10.2018

• Expertise in Test and QA automation and Development of test cases for manual tests using Python.

• Developed Python code to build new features for HPE storage devices (Nimble).

• Developed automated test scripts using Python to validate functionality, perform regression testing, and ensure software quality across multiple platforms and browsers.

• Prepared Unit test cases using Python, remote testing and performance testing.

• Developed window-based applications and other GUI components using Django.

• Expertise in fast-paced Agile Methodologies (Scrum), Traditional Software models (Waterfall, Test-Driven Development (TDD)).

• Experience in establishing better design patterns to implement MVC and MVP architecture.

• Developed AWS cloud formation templates and setting up Auto scaling for EC2 instances and involved in the automated provisioning of AWS cloud environment using Jenkins.

• Developed code for handling switches and Networks such as NEXUS DATA CENTRE SWITCH.

• Developed Test cases to automate web-based UI using Selenium.

• Used Jenkins build to deploy for Continuous Integration and Continuous Deployment (CI/CD).

Education

Master of Science - Computer Science

Wright State University
Dayton, OH
05.2001 -

Skills

undefined

Timeline

Sr. Data Engineer

Ally Bank
05.2023 - Current

Azure Developer

BEHR Corporation
12.2022 - 04.2023

Hadoop engineer

Ace Turtle
06.2020 - 07.2022

Python Developer

Eko India Financial Services
11.2018 - 05.2020

Software Engineer

Hewlett Packard Enterprise
02.2016 - 10.2018

Master of Science - Computer Science

Wright State University
05.2001 -
Priya G