Summary
Overview
Work History
Education
Skills
Technology Summary
Certification
Timeline
Generic

DEVIPRIYA DAMODAR

San Antonio,TX

Summary

Responsive expert experienced in monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Sr. Data Engineer

HireRight- Accenture
10.2021 - Current
  • Implement end to end data solutions such as storage, integration and processing in GCP
  • Expert in implementing data pipelines on Big Data platform using Spark
  • Familiarity with real-time streaming and processing of various data sources, including logs, time series telemetry data, unstructured social data, and relational data
  • Develop and deploy the outcome using spark and python code in cluster running on GCP
  • Developed Data models according to the business requirement
  • Expertise in building data integration and preparation tools using cloud technologies (like Google Dataflow, Cloud Dataprep etc.) Experience in migrating existing legacy applications into Optimized data pipelines utilizing Spark with Scala and Python with testability and perceptibility
  • Experience in creating scalable real-time applications for ingesting clickstream data using Kafka Streams and Spark Streaming
  • Experience of designing, building, and deploying production-level data pipelines using Kafka
  • Developed Optimized and tuned ETL operations in Spark Scripts
  • Demonstrated expertise in selecting appropriate encryption algorithms based on security requirements and performance considerations
  • Implemented end-to-end encryption for Avro files, ensuring the confidentiality and integrity of data stored in the Avro format
  • Implemented message-level encryption using industry-standard protocols like SSL/TLS to establish secure communication channels within Kafka clusters
  • Demonstrated expertise in handling schema evolution and ensuring compatibility in encrypted Avro files, allowing for seamless data updates while maintaining security
  • Created Jupyter notebooks with PySpark for extensive in-depth data analysis and exploration
  • Developed code coverage and test cases integrations using Sonar
  • Experience in converting Hive scripts to PySpark applications for faster ETL operations
  • Demonstrated proficiency in designing, developing, and implementing Extract, Transform, Load (ETL) processes using traditional ETL tools
  • Extensive experience with DataStage, leveraging its features for efficient data integration, transformation, and loading
  • Orchestrated CI/CD workflows to enhance development efficiency and ensure faster delivery of software
  • Worked on DataStage tools like DataStage Designer, DataStage Director, DataStage Administrator
  • Worked on Scheduling Sequence and parallel jobs on DataStage Director
  • Pushed application logs and data streams logs to Kibana server for monitoring and alerting purpose
  • Worked on converting the multiple SQL Server and Oracle stored procedures into Hadoop using Spark SQL, Hive, Scala and Java
  • Created Bash scripts for file manipulation, parsing, and data extraction, contributing to streamlined data processing pipelines
  • Stability and availability of critical systems
  • Designed and implemented automated workflows using Control-M to streamline end-to-end business processes
  • Utilized awk, sed, and grep commands effectively for text processing and pattern matching
  • Used Sqoop to extract data from Oracle SQL server and MySQL databases to HDFS
  • Worked on migrating data from HDFS to and Azure Databricks
  • Involved in loading data from LINUX file system to HDFS
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, manage and review data backups, manage & review log files
  • Developed end to end unit testing and integration testing for data pipelines using spark Scala.

Data Engineer

The Weather Company, IBM
12.2019 - 10.2021
  • Experience in migrating existing legacy applications into Optimized data pipelines utilizing Spark with Scala and Python with testability and perceptibility
  • Experience in creating scalable real-time applications for ingesting clickstream data using Kafka Streams and Spark Streaming
  • Developed Optimized and tuned ETL operations in Hive and Spark Scripts
  • Designed and executed complex data transformations to meet business requirements, ensuring data quality and consistency
  • Utilized DataStage transformations effectively to handle data cleansing, validation, and enrichment
  • Worked on Talend integrations to ingest data from various sources into DataLake
  • Developed an MVP on trading data to Snowflake to get the usage and benefits for migration
  • Implemented Cloud integrations to AWS and Azure storage buckets for bi-directional data flow setups for data migrations
  • Used terraform for infrastructure as code
  • Created various resources in AWS using terraform
  • Hands on with Redshift Database (ETL data pipelines from AWS - MySQL Engine to Redshift)
  • Developed PL/SQL procedures and used them in Stored Procedure Transformations
  • Developed Bash scripts to interact with RESTful APIs, handling authentication, data retrieval, and payload processing
  • Ensured secure and efficient communication with external APIs through script implementation
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Automated end to end jobs using oozie in on prem cluster and airflow in cloud
  • Extensively worked on building automations using Shell Script and Python
  • Created Jupyter notebooks with PySpark for extensive in-depth data analysis and exploration
  • Developed code coverage and test cases integrations using Sonar
  • Experience in converting Hive scripts to PySpark applications for faster ETL operations
  • Pushed application logs and data streams logs to Kibana server for monitoring and alerting purpose
  • Worked on migrating data from HDFS to Azure HD Insights and Azure Databricks
  • Monitored and managed Control-M jobs to proactively identify and resolve issues, ensuring the stability and availability of critical systems
  • Implemented various modules in microservices to expose data through Restful API’s
  • Converting SAS datasets into csv files using PySpark
  • Implemented end-to-end CI/CD pipelines using GitLab CI for automated build, test, and deployment processes
  • Developed Jenkins pipelines for continuous integration and deployment purpose
  • Implemented Docker pipelines for testing and validation in integration and deployment process
  • Developed end to end unit testing and integration testing for data pipelines using spark scala.

Data Engineer

Wafer Space
11.2017 - 12.2019
  • Experience in working multiple projects and agile teams involved in analytics and cloud platforms
  • Experience in building scalable data pipelines in Azure cloud platform using different tools
  • Developed multiple optimized PySpark applications using Azure Databricks
  • Built bi-directional ingestion pipelines in Hadoop and AWS S3 storage
  • Developed data pipelines using Azure Data Factory that process cosmos activity
  • Implemented reporting stats on top of real time data using Power BI
  • Assigning user level/ group level permissions on Redshift schema for security reasons
  • Restoring PostgreSQL into Redshift
  • Manage metadata alongside the data for visibility of where data came from, its linage to ensure and quickly and efficiently finding data for customer projects using AWS Data Lake and its complex functions like AWSLambda, AWS Glue
  • Developed ETL solutions using SSIS, Azure Data Factory and Azure Data Bricks
  • Developed Stored Procedures, Functions, Packages and SQL Scripts using PL/SQL
  • Scheduling jobs in control M and handling them
  • Data loaded from legacy systems (ETL Operations) using PL/SQL and SQL
  • Loader
  • Expert in working continuous integration and deployment using Jenkins
  • Developed real time ingestion data pipelines from Event Hub into different tools
  • Developed optimized ETL jobs using PySpark
  • Experience in building ETL solutions using Hive and Spark with Python and Scala
  • Expert in working on optimizing applications built using tools like Spark and Hive
  • Developed a custom message consumer to consume the data from the Kafka producer and push the messages to Service Bus and Event Hub (Azure Components)
  • Experienced in building data pipelines in several cloud platforms like Azure, AWS and GCP
  • Developed Azure Data Factory data pipelines that process the data utilizing the Cosmos Activity
  • Developed real time streaming dashboards in Power BI using Stream Analytics
  • Developed job automations from different clusters using Airflow Scheduler
  • Worked on Talend integration with on prem cluster and Azure Cloud Sql for data migrations
  • Developed code coverage and test cases integrations using sonar and Mockito
  • Implemented SFTP protocols to ensure secure and encrypted file transfers over networks
  • Configured and maintained SFTP servers to facilitate the secure exchange of sensitive data
  • Conducted performance optimizations to balance the need for strong encryption with the requirement for efficient and timely file transfers
  • Implemented compression techniques to improve data transfer speeds while maintaining security
  • Experience in developing SFTP, NAS integrations to ingest data into HDFS using Python
  • Implemented messaging queues and routes using microservices camel in java
  • Developed batch ingestion jobs from Teradata to HDFS and Hive using Sqoop
  • Developed part of data lake platform using multiple tools like Kafka, Sqoop, Hive, Spark and Oozie
  • Implemented end to end job automation in hadoop using Apache Oozie
  • Developed transnational system updates in HBase for data lake implementations
  • Developed end to end ETL operations in optimized way using Hive and Spark
  • Expert in handling complex data issues and memory optimizations and tuning in Spark
  • Implemented multiple data pipelines using Apache Spark using python and Scala
  • Implemented CI-CD pipelines to build and deploy the projects in Hadoop environment using Jenkins
  • Developed real time streaming application to ingest Json messages using Apache Kafka
  • Implemented Data security features in data exposed through API endpoints
  • Expert in writing in many features using scripting languages like Bash, Shell and Python
  • Worked on implementing CRUD operations in HBase for multiple applications
  • Handling the Tickets/Service Calls raised by End-users and providing them faster resolution
  • Implemented multiple Change Request includes of new developments
  • Co-ordinate with offshore team for timely completion of deliverables.

Education

Master’s in Data Analytics -

Northeastern University
Boston, MA
07.2021

Bachelor of Technology -

Amrita School of Engineering
India

Skills

  • ETL development
  • Big Data Processing
  • Python Programming
  • NoSQL Databases
  • Data Pipeline Design
  • Data Modeling
  • Hadoop Ecosystem
  • Data Warehousing
  • Spark Development
  • Advanced SQL

Technology Summary

  • Expert in working different big data distributions like Cloudera, Hortonworks and MapR.
  • Experience in implementing bi-directional batch data ingestion flows using Apache Sqoop.
  • Expert in building real-time ingestion flows in HDFS and any databases using Flume and Kafka.
  • Expert in handling different optimized data formats like Orc, Parquet, Avro, and Sequence files.
  • Implemented real time Json & Avro streaming into dynamic Hive tables using Kafka.
  • Worked on implementing various ETL transformation using MapReduce and Pig.
  • Experience in upgradation and migration of OBIEE between dev/test/prod environments.
  • Implemented optimized data pipelines in Hive and Spark for various data transformations.
  • Extensively worked on various Hive optimization techniques to run jobs without issues.
  • Developed efficient spark scalable applications using python and scala for ETL purpose.
  • Implemented various optimization techniques like memory optimizations in spark.
  • Knowledge on using machine learning libraries in spark for data explorations and predications.
  • Automated end to end jobs in on prem hadoop cluster using Apache Oozie and Cron Scheduler.
  • Implemented automation pipelines in Airflow for multiple development platforms orchestration.
  • Developed various integration pipelines using Apache NiFi and Talend.
  • Developed ingestion pipelines from various RDBMS sources to HDFS and Hive using Talend.
  • Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera, Hortonworks, and NoSQL platforms (HBase & Cassandra).
  • Expert in data lake ingestion setup in Hive and HBase for historical and incremental purpose.
  • Working experience in working on NoSQL platforms like HBase, MongoDB and Cassandra.
  • Experience in working on multiple cloud environments like Aws, Azure and GCP.
  • Expert in working with AWS tools like S3, RDS, Redshift, Elastic Cache and Dynamo DB.
  • Implemented various python notebooks in Azure Databricks for analytics purpose.
  • Experience in ingested data from Event Hub into Azure Sql for analysis.
  • Expert in handling Azure tools like Azure Data Factory, Azure Stream analytics, Azure HDInsight’s, and Cosmos DB for implementing end to end data pipelines.
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
  • Extensive ETL tool experience using IBM Infosphere/WebSphere DataStage, Ascential DataStage
  • Exposure on GCP tools like BigQuery, Cloud SQL, Pub/Sub, GCS, and Data Proc.
  • Expert in writing scripting for automations like Python, Bash and Shell.
  • Experience in working on different BI tools like Tableau, Qlikview, Domo and Power BI.
  • Expert in implementing code coverage for application development using Sonar.
  • Working knowledge on container orchestration tools like Docker and Kubernetes.
  • Developed different modules in application developments using microservices in java.
  • Experience in working on different build tools like Maven, Gradle and SBT.
  • Implemented various application and data pipelines using IDE tools like IntelliJ and Eclipse.
  • Experience with Snowflake Multi-Cluster Warehouses.
  • Exposure in technologies for application development using tools like Snowflake, Druid.
  • Exposed application metrics and logs using tools like Kibana and Grafana.

Certification

  • Certified Professional Data Engineer, Google Cloud Platform (GCP) - 2023 - 2025

Timeline

Sr. Data Engineer

HireRight- Accenture
10.2021 - Current

Data Engineer

The Weather Company, IBM
12.2019 - 10.2021

Data Engineer

Wafer Space
11.2017 - 12.2019

Master’s in Data Analytics -

Northeastern University

Bachelor of Technology -

Amrita School of Engineering
DEVIPRIYA DAMODAR