Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Anerug Veaeprhanani

Summary

  • Results-oriented IT professional with a strong eye to detail and 8 years of experience in big data tech stack, ETL tools, Data Warehousing and machine learning model hardening areas.
  • Strong understanding of distributed systems concepts, including fault tolerance, scalability, data consistency, and expertise in designing, implementing and maintaining Kafka-based data pipelines for real-time data processing, including data ingestion, transformation, and streaming analytics.
  • Expertise in designing and developing big data solutions on Hadoop using Spark, HDFS, HBase, Hive, Impala and expertise in performance tuning of spark-based applications.
  • Expertise in hardening Machine Learning models on containerized platforms like OpenShift and expertise in building reusable components for data pipelines.
  • Excellent communication skills, interpersonal skills, problem solving skills, a very good team player with a positive attitude.

Overview

10
10
years of professional experience

Work History

Senior Data Engineer

Apexon
08.2020 - Current
  • Designed and built batch and real time streaming (using Kafka) data quality application for files in raw layer, and for critical data elements between semantic layers.
  • Hardened various supervised and unsupervised ML models as well as rule engines built using Natural Language Processing (NLP) on containerized platforms like OpenShift and using CI/CD pipelines on GitLab.
  • Built ingestion services for actively controlling the data ingested into the platform using AWS Glue, S3, Python, Spark, SQS, Lambda, SNS and EMR.
  • Built reusable components for cloud data migration using databricks notebooks.
  • Build historical timestamping of data in data model tables using SCD Type -1, SCD Type -2 and SCD Type -4 (Mini Dimensions).
  • Guided junior developers.

Application Developer

ERP Analysts Inc.
12.2019 - 08.2020
  • Built Ingestion services framework using Python for moving data from disparate sources with multiple file formats like ASCII, EBCDIC, XML, JSON and Parquet.
  • Built provisioning services using outbound process to send extracts to client along with token file to pre-defined external location using Spark, Java and SQL.
  • Built reusable components to migrate legacy applications as-is to the data platform retrofitting it to utilize core services like data quality, job/plan orchestration and auditing/logging framework.

Application Developer

Gathi Analytics LLC
06.2019 - 11.2019
  • Built data profiling tool in python which aids in analyzing source system data to determine data relationships (for data integration), design constructs, consistency and quality with respect to analysis and data modeling.
  • Created and maintained optimal data pipelines based on platform and application requirements using Informatica, Azure SQL Data warehouse and Azure SQL Database on Azure Cloud.
  • Created reusable components for data pipelines and scripts to maintain Marketing Analytics Data Marts on Azure SQL Data warehouse and Azure SQL Database (on Azure Cloud) to support Machine Learning Models on Data Lab using IICS (Informatica Intelligent Cloud Services) and big data technology Stack (Spark, Hive, Hadoop, Impala etc).
  • Optimized SQL queries and long running jobs to meet service level agreements.

Application Developer

ERP Analysts Inc.
06.2017 - 05.2019
  • Developed microservices using Python for data cleansing, data standardization, data transformation and data loading in adherence with industry standards and methodologies.
  • Built RESTful APIs using Python (Django, Flask) frameworks to archive/retrieve Integration and semantic zone data along with indexing, searching and purging and for Workload management.
  • Built user interfaces using HTML, CSS and JavaScript for Workload management and for Operational Dashboard used for monitoring jobs, re-runs, alert and view error/logs.
  • Developed automated microservices from batch to event driven data flow and processing capabilities using Kafka integrated with auditing and job execution framework enabling real time processing.
  • Built preprocessing scripts, scheduled jobs using orchestration frameworks like Airflow and Control-M for multiple data sources.
  • Designed and developed semantic layer applications to load data to Data Marts, build alerts with reference data to run compliance tests.

Assistant System Engineer Trainee

TATA Consultancy Services
03.2015 - 12.2015
  • Developed in-house library for a multinational client using comprehensive knowledge of data structures and algorithms such as linked lists, stacks, queues, searching and sorting algorithms, etc.
  • Provided L3 production support for client applications.


Education

Masters - applied computer science

Northwest Missouri State University
04.2017

Bachelors - technology

NIT Trichy
08.2014

Skills

Big Data tech stack

Hadoop, HDFS, Apache Spark, Hive, Impala, HBase, Sqoop, Kafka

File Formats

Parquet, Fixed Width, ASCII, JSON, EBCDIC, CSV, AVRO

Programming/Scripting Languages/ML stack

Python, Java, SQL, Unix/Linux Shell Scripting, Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn, spaCy, XGBoost

Containerization Platforms/Engine

OpenShift, Docker

Job Orchestration Tools

Airflow, Control-M

Databases/Data warehouses

MySQL, PostgreSQL, SQL Server, Netezza, HBase, MongoDB, Snowflake, Hive

Timeline

Senior Data Engineer

Apexon
08.2020 - Current

Application Developer

ERP Analysts Inc.
12.2019 - 08.2020

Application Developer

Gathi Analytics LLC
06.2019 - 11.2019

Application Developer

ERP Analysts Inc.
06.2017 - 05.2019

Assistant System Engineer Trainee

TATA Consultancy Services
03.2015 - 12.2015

Masters - applied computer science

Northwest Missouri State University

Bachelors - technology

NIT Trichy
Anerug Veaeprhanani