Summary
Overview
Work History
Education
Skills
Timeline
Generic

DIVAKAR A

Summary

Practical Database Engineer possessing in-depth knowledge of data manipulation techniques and computer programming paired with expertise in integrating and implementing new software packages and new products into system. Offering 10 years background managing various aspects of development, design and delivery of database solutions. Tech-savvy and independent professional bringing outstanding communication and organizational abilities.

Overview

10
10
years of professional experience

Work History

Sr. Data Engineer

Progressive Insurance
Mayfield, Ohio
02.2022 - Current
  • Designed and implemented data pipelines, ETLs, warehouses and reporting systems leveraging technologies like Python, Spark, Dataflow, Airflow, Snowflake and Databricks to enable data-driven advertising solutions
  • Developed data engineering pipeline that reduced processing time by 50%, resulting in cost savings
  • Designed and deployed real-time data processing system, increasing data processing speed by 75% and enabling timely decision-making
  • Implemented data quality checks and monitoring system, leading to a 25% decrease in data errors and improved data accuracy
  • Developing Spark programs with Python, and applied principles of functional programming to process the complex structured data sets
  • Applied Agile methodologies in design and development of ETL applications and data processing workflows
  • Developed infrastructure as code solutions using CDK, enabling efficient management of AWS resources for GenAI applications
  • Responsible for design and development of High-performance data architectures which support data warehousing, real-time ETL and batch big-data processing
  • Integrated Boto3 into data engineering pipelines for seamless interaction with AWS services, enabling robust retrieval and generation processes for GenAI applications
  • Developed and optimized complex SQL queries, stored procedures, and triggers for relational database management systems (RDMS), ensuring efficient data retrieval and manipulation
  • Worked with Hadoop infrastructure to storage data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in AWS
  • Converting Hive/SQL queries into Spark transformations using Spark RDDs and PySpark
  • Analyzing SQL scripts and designed the solution to implement using PySpark
  • Export tables from Teradata to HDFS using Sqoop and build tables in Hive
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
  • Use SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL
  • Designed and implemented PLSQL packages and procedures to support data processing workflows
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Designed and implemented complex Graph databases, optimizing them for efficient data retrieval and traversal
  • Implemented best practices in Databricks notebooks and jobs, ensuring maintainability and scalability of data processing tasks
  • Working experience in Financial Reports “2052A”, “FR-Y9C”, “14Q”,10K-Q”
  • Involved as primary on-site ETL Developer during the analysis, planning, design, development, and implementation stages of projects using IBM Web Sphere software (Quality Stage v9.1, Web Service, Information Analyzer, Profile Stage)
  • Implemented container orchestration strategies, leveraging Kubernetes for automating deployment, scaling, and management of Docker containers, streamlining application deployment processes
  • Developed complex T-SQL queries for data extraction, transformation, and reporting purposes
  • Utilized GCP services like BigQuery, Cloud Storage and DataProc to build and orchestrate data pipelines, leveraging the scalability of the cloud
  • Authored technical design and documentation outlining data systems architecture, ETL processes, and proposed enhancements
  • Implemented infrastructure-as-code practices on GCP using Terraform, enabling repeatable and consistent provisioning of cloud resources
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services)
  • Using Informatica & SSIS, SPSS, SAS to extract transform & load source data from transaction systems
  • Involved with writing scripts in Oracle, SQL Server and Netezza databases to extract data for reporting and analysis and Worked in importing and cleansing of data from various sources like DB2, Oracle, at files onto SQL Server with high volume data
  • Built end-to-end data lakes on Cloud Storage (GCP), leveraging its unlimited scalability and geo-redundancy for resilient and cost-effective data storage
  • Evaluated big data technologies and prototype solutions to improve our data processing architecture
  • Data modeling, development and administration of relational and NoSQL databases (Big Query, Elastic Search)
  • Leveraged Erwin Data Modeler to design and implement complex data models, ensuring efficient database structures
  • Led the development and maintenance of both Relational and NoSQL data models for comprehensive Data Warehouse solutions
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLTP reporting
  • Proficient in utilizing TOAD for Oracle to streamline database development processes
  • Wrote and optimized SQL queries, stored procedures, and triggers, contributing to improved data retrieval and manipulation
  • Designed and implemented end-to-end data workflows using Apache Airflow, optimizing data processing and ensuring scalability
  • Hands on experience on ‘DataIku’ visualization tool
  • Environment: IBM Info sphere DataStage 9.1/11.5, Oracle 11g, Flat les, Autosys, GCP, UNIX, Erwin, TOAD, MS SQL Server database, XML les, AWS, MS Access database.

Sr. Data Engineer

Arvest Bank
Bentonville, Arkansas
04.2020 - 12.2021
  • This project was focused on customer clustering
  • Used the ETL Data Stage Director to schedule and running the jobs, testing and debugging its components & monitoring performance statistics
  • Implemented advanced analytics in data engineering pipelines, leading to a 30% improvement in fraud detection
  • Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing
  • Designed and optimized high performance data pipelines and ETL processes leveraging GCP services like Dataflow, DataProc, Composer, and BigQuery
  • Orchestrated and managed containerized applications effectively by deploying and maintaining Kubernetes clusters
  • Utilized RDMS (MySQL, PostgreSQL) for designing and implementing data models, ensuring proper normalization and adherence to database design principles
  • Architected, Designed and Developed Business applications and Data marts for reporting
  • Engineered automation scripts using Boto3 to interact with AWS services programmatically, enhancing operational efficiency in managing AI/ML workloads
  • Documentation, design specifications, and best practices guides for GCP data engineering solutions
  • Implemented Spark GraphX application to analyze guest behavior for data science segments
  • Managed version control in Bitbucket for data engineering projects
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Managed Oracle databases, including installation, configuration, and ongoing administration tasks
  • Migrated Hadoop workloads from on premise to Cloud DataProc (GCP), taking advantage of auto-scaling clusters and managed infrastructure
  • Created custom operators in Python to extend Airflow functionality, tailored to specific data processing requirements and integrations with diverse systems
  • Implemented secure data sharing and access controls within Snowflake, maintaining data governance standards
  • Designed and implemented data engineering workflows on Databricks, facilitating seamless ETL processes
  • Led the implementation of a robust data lake infrastructure on [GCP], ensuring secure storage and governance of diverse data types
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop
  • Utilized expertise in Snowflake and Big Query to build and maintain data warehouses, contributing to improved data accessibility
  • Leveraged CDK to define cloud infrastructure in TypeScript/Python, ensuring consistency and scalability across development and production environments
  • Involved in creating UNIX shell scripts for database connectivity and executing queries in parallel job execution
  • Led the development and optimization of complex data architectures on RDBMS platforms, enhancing scalability and supporting critical business operations
  • Implemented Terraform modules for automating the deployment of cloud resources and maintaining infrastructure consistency
  • Used the ETL Data Stage Director to schedule and running the jobs, testing and debugging its components & monitoring performance statistics
  • Worked closely with the ETL Developers in designing and planning the ETL requirements for reporting, as well as with business and IT management in the dissemination of project progress updates, risks, and issues
  • Utilized version control systems for managing database changes and ensuring a streamlined development process
  • Developed and executed data processing workloads on DataProc clusters(GCP), leveraging the elasticity and auto-scaling capabilities of the cloud
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS
  • Worked in AWS environment for development and deployment of custom Hadoop applications
  • Implemented comprehensive data governance strategies using Erwin, defining and documenting data standards, ensuring consistency, and facilitating collaboration across teams
  • Worked on Data modeling, Advanced SQL with Columnar Databases using AWS
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast
  • Extracted the data from Teradata into HDFS using Sqoop
  • Tracking and resolving data engineering issues efficiently using Jira
  • Demonstrated advanced SQL techniques and data structure understanding to streamline data processing
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI
  • Played a crucial role in ensuring data integrity and security throughout the data lifecycle
  • Environment: Python, SQL server, Hadoop, HDFS, HBase, Map Reduce, Hive, Impala, Pig, Sqoop, Mahout, LSTM, RNN, Spark MLlib, MongoDB, AWS, Tableau, Unix/Linux.

Sr. Data Engineer

Cardinal health
Dublin, Ohio
07.2018 - 03.2020
  • Implemented a robust data engineering solution that enhanced Cardinal Health's supply chain resilience, reducing lead times by 30% and ensuring timely delivery of critical healthcare products
  • Involved in designing data warehouses and data lakes on regular (Oracle, SQL Server) high performance on big data (Hadoop - Hive and HBase) databases
  • Data modeling, Design, implement, and deploy high- performance, custom applications at scale on Hadoop /Spark
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server database systems
  • Translated business requirements into working logical and physical data models for OLTP &OLAP systems
  • Employed TOAD for performance monitoring, tuning, and optimization of SQL queries and database processes
  • Creation of BTEQ, Fast export, Multi Load, TPump, Fast load scripts for extracting data from various production systems
  • Spearheaded automation initiatives, employing Terraform to streamline the provisioning and configuration of cloud resources
  • Reviewed Stored Procedures for reports and wrote test queries against the source system (SQL Server- SSRS) to match the results with the actual report against the Datamart (Oracle)
  • Led the implementation of Snowflake as a cloud-based data warehousing solution, providing a scalable and elastic platform for data storage
  • Implemented CI/CD pipelines with Bitbucket Pipelines for GCP data solutions
  • Implemented T-SQL scripts for data migration and synchronization between different database systems
  • Integrated Apache Airflow with cloud services to leverage cloud-native features for scalable and efficient data processing
  • Established robust data governance practices across GCP, including access controls, encryption, auditing and monitoring to ensure security and compliance
  • Leveraged BigQuery for analytical processing, executing complex queries on large datasets to extract valuable business insights
  • Developed end-to-end ETL pipelines that processed over 100 million rows daily with minimal failures ensuring strict SLAs
  • Creating the data pipelines using state of the art Big Data frameworks/tools
  • Designed and implemented automated CI/CD pipelines using Jenkins
  • Worked on storing the data frame into hive as table using Python (PySpark)
  • Experienced in ingesting data into HDFS from various Relational databases like Teradata using Sqoop and exported data back to Teradata for data storage
  • Extensively worked with Avro and Parquet les and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Spark
  • Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size
  • Implemented spring security for SQL injunction and user access privileges, Used various Java, J2EE design patterns like DAO, DTO, Singleton etc
  • Experience in creating Hive Tables, Partitioning and Bucketing
  • Performed data analysis and data pro ling using complex SQL queries on various sources systems including Oracle 10g/11g and SQL Server 2012
  • Led optimization and performance tuning of big data jobs and SQL queries reducing time by 45%
  • Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations
  • Imported data from various sources into Spark RDD for processing
  • Developed custom aggregate functions using Spark SQL and performed interactive querying
  • Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration
  • Orchestrated efficient large-scale ETL workflows and their deployments, including testing of the workflows on AWS using S3, EMR, Athena, Glue, IAM, and Service Catalog
  • Created and Managed the Cloud Formation Templates for all the AWS Services an ETL workflow need, such as IAM, S3, EMR, Glue etc
  • Developed Spark applications for the entire batch processing by using Scala
  • Automatically scale-up the EMR instances based on the data
  • Stored the time-series transformed data from the Spark engine built on top of a Hive platform to Amazon S3 and Redshift
  • Facilitated deployment of multi-clustered environment using AWS EC2 and EMR apart from deploying Dockers for cross-functional deployment
  • Visualized the results using Tableau dashboards and the Python Seaborn libraries were used for Data interpretation in deployment
  • Utilized Data Engineering tools and frameworks to enhance productivity and output quality for fellow engineers, contributing to a more efficient and collaborative work environment
  • Worked with business owners/stakeholders to assess Risk impact, provided solution to business owners
  • Experienced in determine trends and signi can’t data relationships Analyzing using advanced Statistical Methods
  • Carrying out specified data processing and statistical techniques such as sampling techniques, estimation, hypothesis testing, time series, correlation and regression analysis Using R
  • Spearhead Glue & EMR based ETL workflow development initiative as Subject Matter Expert and primary point-of-contact for the fellow Data Engineers
  • Created proof of concepts for innovative solutions for the Data Quality and Data Profiling using AWS Deequ/Pydeequ framework along with Glue and EMR
  • Applied various data mining techniques: Linear Regression & Logistic Regression, classification, clustering
  • Took personal responsibility for meeting deadlines and delivering high quality work
  • Strived to continually improve existing methodologies, processes, and deliverable templates
  • Environment: R, SQL server, Oracle, HDFS, Glue, HBase, AWS, Map Reduce, Hive, Impala, Pig, Sqoop, NoSQL, Tableau, RNN, LSTM, Unix/Linux, Core Java.

Data Engineer

Aldi
Dallas, Texas
09.2015 - 06.2018
  • Involved in Analysis, Design and Implementation/translation of Business User requirements
  • Worked on collection of large sets using Python scripting
  • Spark SQL
  • Worked on large sets of Structured and Unstructured data
  • Worked on creating DL algorithms using LSTM and RNN
  • Developed and automated data ingestion processes using Beam, Kafka, Debezium for change data capture (CDC) from databases like Oracle, MySQL etc
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date
  • Developed SQOOP scripts to migrate data from Oracle to Big data Environment
  • Extensively worked with Avro and Parquet les and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Spark
  • Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size
  • Implemented spring security for SQL injunction and user access privileges, Used various Java, J2EE design patterns like DAO, DTO, Singleton etc
  • Experience in creating Hive Tables, Partitioning and Bucketing
  • Performed data analysis and data pro ling using complex SQL queries on various sources systems including Oracle 10g/11g and SQL Server 2012
  • Led optimization and performance tuning of big data jobs and SQL queries reducing time by 45%
  • Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations
  • Imported data from various sources into Spark RDD for processing
  • Developed custom aggregate functions using Spark SQL and performed interactive querying
  • Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration
  • Orchestrated efficient large-scale ETL workflows and their deployments, including testing of the workflows on AWS using S3, EMR, Athena, Glue, IAM, and Service Catalog
  • Created and Managed the Cloud Formation Templates for all the AWS Services an ETL workflow need, such as IAM, S3, EMR, Glue etc
  • Developed Spark applications for the entire batch processing by using Scala
  • Automatically scale-up the EMR instances based on the data
  • Stored the time-series transformed data from the Spark engine built on top of a Hive platform to Amazon S3 and Redshift
  • Facilitated deployment of multi-clustered environment using AWS EC2 and EMR apart from deploying Dockers for cross-functional deployment
  • Visualized the results using Tableau dashboards and the Python Seaborn libraries were used for Data interpretation in deployment
  • Utilized Data Engineering tools and frameworks to enhance productivity and output quality for fellow engineers, contributing to a more efficient and collaborative work environment
  • Worked with business owners/stakeholders to assess Risk impact, provided solution to business owners
  • Experienced in determine trends and signi can’t data relationships Analyzing using advanced Statistical Methods
  • Carrying out specified data processing and statistical techniques such as sampling techniques, estimation, hypothesis testing, time series, correlation and regression analysis Using R
  • Spearhead Glue & EMR based ETL workflow development initiative as Subject Matter Expert and primary point-of-contact for the fellow Data Engineers
  • Created proof of concepts for innovative solutions for the Data Quality and Data Profiling using AWS Deequ/Pydeequ framework along with Glue and EMR
  • Applied various data mining techniques: Linear Regression & Logistic Regression, classification, clustering
  • Took personal responsibility for meeting deadlines and delivering high quality work
  • Strived to continually improve existing methodologies, processes, and deliverable templates
  • Environment: R, SQL server, Oracle, HDFS, Glue, HBase, AWS, Map Reduce, Hive, Impala, Pig, Sqoop, NoSQL, Tableau, RNN, LSTM, Unix/Linux, Core Java.

Data Analyst/Engineer

NTT Ltd.
Dallas, Texas
09.2014 - 08.2015
  • Worked on different dataflow and control flow task, for loop container, sequence container, script task, executes SQL task and Package configuration
  • Created new procedures to handle complex logic for business and modified already existing stored procedures, functions, views and tables for new enhancements of the project and to resolve the existing defects
  • Loading data from various sources like source to destination
  • Created batch jobs and configuration to create automated process using SSIS
  • Created SSIS packages to pull data from SQL Server and exported to Excel Spreadsheets and vice versa
  • Built SSIS packages, to fetch le from remote location like FTP and SFTP, decrypt it, transform it, mart it to data warehouse and provide proper error handling and alerting
  • Extensive use of Expressions, Variables, Row Count in SSIS packages
  • Data validation and cleansing of staged input records was performed before loading into Data OLEDB, at les to SQL Server database Using SSIS Packages and created data mappings to load the data from Warehouse
  • Automated the process of extracting the various les like at/excel files from various sources like FTP and SFTP (Secure FTP)
  • Deploying and scheduling reports using SSRS to generate daily, weekly, monthly and quarterly reports
  • Environment: MS SQL Server 2005 & 2008, SQL Server Business Intelligence Development Studio, SSIS- 2008, SSRS-2008, Report Builder, Office, Excel, Flat Files, .NET, T-SQL.

Education

Master of Science - Computer Engineering

UCO
Edmond, OK
05-2015

Skills

  • Data Warehousing
  • Data Modeling
  • Python Programming
  • API Development
  • NoSQL Databases
  • SQL
  • AWS
  • GCP
  • HDFS
  • MapReduce
  • Data Migration
  • Big data technologies
  • SQL and Databases
  • Data Visualization
  • Hadoop
  • Azure
  • BigQuery
  • ETL
  • Pyspark
  • Hive
  • HBase
  • Kafka

Timeline

Sr. Data Engineer

Progressive Insurance
02.2022 - Current

Sr. Data Engineer

Arvest Bank
04.2020 - 12.2021

Sr. Data Engineer

Cardinal health
07.2018 - 03.2020

Data Engineer

Aldi
09.2015 - 06.2018

Data Analyst/Engineer

NTT Ltd.
09.2014 - 08.2015

Master of Science - Computer Engineering

UCO
DIVAKAR A