Summary
Overview
Work History
Skills
Best team member of the year-Thought focus
Timeline
Generic

Anvesh N

Houston,TX

Summary

Having around 10+ years of experience on Data Engineering Technologies like Big Data, Hadoop, PySpark, Cloud(AWS Databricks, GCP, Azure), Database and Data warehousing(amazon S3, Azure Delta lake, Bigquery). Extensively worked on Big Data Technologies in Hadoop eco system (Spark, Hive, Sqoop, oozie, Yarn and HDFS). Have Strong Experience on SQL, in-depth knowledge on Oracle, MySql Database and Database concepts. Good experience and knowledge on Python functional, scripting and Object-oriented programming. As of Current project, worked on Spark Python API also have good Understanding on scala API. Implemented Highly Complex Transformations and Business logics in Spark and Hive. Have extensive experience in developing Spark core, Spark SQL applications using Spark 3.x, 2.x & 1.6 versions. Have Strong knowledge on spark concepts like RDD, Data Frame, Catalyst Optimizer, Tungsten Optimizer, Broadcast variables, Accumulators, Executors, Cores, Tasks, Parallelism, Partitions etc. Experienced in Understanding Spark Application Execution plan, Resource Utilization, Spark web UI and Optimization Techniques. Strong experience in developing Hive queries, explaining plan analysis and performance tuning. Hands-on experience on Shell Scripting. Worked on Airflow for Orchestration. Worked on Messaging systems like Kafka and AWS Queue. Worked on different file formats like CSV, XML, Swift, Parquet and Zip. Understanding data processing needs and Procuring the Clusters, Task Instances, Spot Servers and Big-Data technologies from advanced options. Developed POC on Spark Streaming with Custom JMS API and presented to architects. Experienced in using developer friendly tools, SDLC applications like PyCharm,Visual studio, PuTTY, SQL Developer, Excel, MS-Visio, MS-Word, Confluence and Service Now. Experienced in preparing the project Docs, release notes and Run Books. Extensive experience in Root Cause Analysis and Debugging Production System Logs. Attended Knowledge enrichment sessions, shared technical and functional knowledge also given KES sessions on technical topics.

Overview

14
14
years of professional experience

Work History

Data Architect

Neiman Marcus Group Ltd
06.2023 - 04.2025
  • Creation of high-level and low-level design documentation for various modules. Review your design to make sure it meets standards, patterns, and company guidelines. Verification of design specifications against proof of concept and technical considerations.
  • Involved in protecting data with role-based access controls, fine-grained authorization and custom policies to meet the organizational goals.


  • Designed, built, and managed scalable data pipelines with AWS cloud services and snowflake.
  • Collaborated with cross-functional teams to collect and comprehend data needed for sales, marketing, machine learning, and data science projects.
  • Significant contributor in building SSOT (Single Source of Truth),This platform effortlessly onboards massive amounts of batch and real-time streaming data using modern cloud data services. As a result, we achieved a 12x faster time-to-market, with use cases deployed in under 2 months. This platform also led to a 28% increase in ROI for the client's data products when compared to the siloed legacy data systems.
  • Developed custom analytical Spark SQL queries, and achieved better inventory management and cost reduction. This led to a 35% reduction in out-of-stock items and saved $4.5 million in inventory expenses annually.
  • Evaluated emerging technologies and tools to identify opportunities for enhancing existing systems or creating new ones.
  • Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major
    regulatory and financial reports using advanced SQL queries in snowflake.
  • Leveraged advanced analytics tools to create interactive dashboards that provided actionable insights into key business metrics.
  • Perform day-to-day integrations with DB Team to verify that database tables, columns, and metadata are correctly inserted into Snowflake's development, stage, and production environments.


  • Ensure that artifacts such as mapping and data lineage documentation are updated.

Sr.Data Engineer

SJSU,Higher Education Vertical
07.2021 - 02.2023
  • Participated in requirement gathering meetings to better grasp the expectations and collaborated with system analysts to understand the structure and patterns of the upstream source data.
  • Built DataIngestion framework from scratch in GCP cloud services.
  • Involved with data ingestion, transformation, processing, and computation with GCP DataProc using PySpark.
  • Loaded processed data into BigQuery, enabling creation of dashboards for student enrollment and assessment.
  • Worked on messaging systems such as cloud pub/sub system and kafka.
  • ETL pipelines were tuned for better data processing efficiency by 30% through the use of join, bucketing, and partitioning algorithms.
  • Used Dataproc for transforming raw data into structured formats for storing data in BigQuery.
  • Built airflow pipelines for transferring data from on-premises Oracle databases to the GCP cloud- composer.
  • Worked on Environment Upgradation and Automation in GCP.
  • Worked on Dimensional Data modelling in Star and Snowflake schemas and Slowly Changing Dimensions
    (SCD).

Data Engineer

Starbucks
11.2020 - 04.2021
  • Building Data lake(Azure data lake) for inventory management using Azure Databricks.
  • Contribute to agile development procedures, including sprint planning, code reviews, and continuous integration and deployment, to ensure high-quality data products for end users.
  • Developed a predictive model using Tableau and SQL to identify at-risk clients and implement proactive retention actions.
  • Data extraction and transformation operations were automated using SQL and ETL technologies, which helped reduce manual reporting time.
  • To maintain data correctness and dependability, monitor data pipelines and proactively resolve issues with data input, transformation and quality.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and
    aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Used principles of Normalization to improve the performance. Involved in ETL code using PL/SQL in
    order to meet requirements for Extract, transformation, cleansing and loading of data from source to
    target data structures.


DataEngineer

Thales Group
10.2018 - 10.2020
  • Understanding changing passengers’ preferences
  • Analysis of ticket booking
  • Optimizing the price in real-time using predictive analysis techniques.

Software Engineer

Nielsen
09.2015 - 09.2018
  • Keeps customers data up-to-date in real-time
  • Understand Consumers’ tastes and behavior well enough.
  • Showcase personalized products and services

SQL Developer

Mybank Sdn Bhd
12.2010 - 11.2012
  • Generate optimize SQL queries.
  • Data validation through building SQL queries.
  • Help the customer with level 4 support by analysing the issue and assigning respective team to fix it.

Skills

  • Hadoop Eco System : Hive, Sqoop, HDFS, Yarn,Mesos
  • Spark : Spark Core, Spark SQL, Spark Streaming
  • Cloud : AWS,GCP, Azure and Databricks
  • AI: Machine learning,Deep learning Algorithms
  • Data warehouse: Snowflake,Delta lake
  • Devops: Jenkins,Ansible,Azure devops
  • Dashboard: Tableau,Qlik sense
  • Messaging System : Kafka – spark structured Streaming
  • ETL Tools : Azure ADF,Informatica
  • Programming Language : Python, C, C
  • Scripting : Python, shell

Best team member of the year-Thought focus

Adopted and implemented advance tools and technologies for end client requirements. 

Timeline

Data Architect

Neiman Marcus Group Ltd
06.2023 - 04.2025

Sr.Data Engineer

SJSU,Higher Education Vertical
07.2021 - 02.2023

Data Engineer

Starbucks
11.2020 - 04.2021

DataEngineer

Thales Group
10.2018 - 10.2020

Software Engineer

Nielsen
09.2015 - 09.2018

SQL Developer

Mybank Sdn Bhd
12.2010 - 11.2012