Summary
Overview
Work History
Skills
Best team member of the Year at Neiman marcus
Timeline
Generic

Anvesh N

Houston

Summary

Having around 11+ years of experience on Data Engineering Technologies like Big Data, Hadoop, PySpark, Cloud(AWS Databricks, GCP, Azure), Database and Data warehousing(amazon S3, Azure Delta lake, Bigquery). Extensively worked on Big Data Technologies in Hadoop eco system (Spark, Hive, Sqoop, oozie, Yarn and HDFS). Have Strong Experience on SQL, in-depth knowledge on Oracle, MySql Database and Database concepts. Good experience and knowledge on Python functional, scripting and Object-oriented programming. As of Current project, worked on Spark Python API also have good Understanding on scala API. Implemented Highly Complex Transformations and Business logics in Spark and Hive. Have extensive experience in developing Spark core, Spark SQL applications using Spark 3.x, 2.x & 1.6 versions. Have Strong knowledge on spark concepts like RDD, Data Frame, Catalyst Optimizer, Tungsten Optimizer, Broadcast variables, Accumulators, Executors, Cores, Tasks, Parallelism, Partitions etc. Experienced in Understanding Spark Application Execution plan, Resource Utilization, Spark web UI and Optimization Techniques. Strong experience in developing Hive queries, explaining plan analysis and performance tuning. Hands-on experience on Shell Scripting. Worked on Airflow for Orchestration. Worked on Messaging systems like Kafka and AWS Queue. Worked on different file formats like CSV, XML, Swift, Parquet and Zip. Understanding data processing needs and Procuring the Clusters, Task Instances, Spot Servers and Big-Data technologies from advanced options. Developed POC on Spark Streaming with Custom JMS API and presented to architects. Experienced in using developer friendly tools, SDLC applications like PyCharm,Visual studio, PuTTY, SQL Developer, Excel, MS-Visio, MS-Word, Confluence and Service Now. Experienced in preparing the project Docs, release notes and Run Books. Extensive experience in Root Cause Analysis and Debugging Production System Logs. Attended Knowledge enrichment sessions, shared technical and functional knowledge also given KES sessions on technical topics.

Overview

14
14
years of professional experience

Work History

Sr. Data Engineer

Neiman Marcus Group Ltd
06.2023 - 04.2025
  • Creation of high-level and low-level design documentation for various modules. Review your design to make sure it meets standards, patterns, and company guidelines. Verification of design specifications against proof of concept and technical considerations.
  • Involved in protecting data with role-based access controls, fine-grained authorization and custom policies to meet the organizational goals.
  • Designed, built, and managed scalable data pipelines with Azure and Databricks cloud services.
  • Collaborated with cross-functional teams to collect and comprehend data needed for sales, marketing, machine learning, and data science projects.
  • Significant contributor in building SSOT (Single Source of Truth),This platform effortlessly onboards massive amounts of batch and real-time streaming data using modern cloud data services. As a result, we achieved a 12x faster time-to-market, with use cases deployed in under 2 months. This platform also led to a 28% increase in ROI for the client's data products when compared to the siloed legacy data systems.
  • Developed custom analytical Spark SQL queries, and achieved better inventory management and cost reduction. This led to a 35% reduction in out-of-stock items and saved $4.5 million in inventory expenses annually.
  • Evaluated emerging technologies and tools to identify opportunities for enhancing existing systems or creating new ones.
  • Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major
    regulatory and financial reports using advanced SQL queries in snowflake.
  • Built Databricks workflow pipelines for transferring data from on-premises Oracle databases to the Azure cloud.
  • used Apache Iceberg for building modern data lakes and lakehouses, simplifying data management and enabling more advanced analytical workloads.
  • Leveraged advanced analytics tools to create interactive dashboards that provided actionable insights into key business metrics.
  • Perform day-to-day integrations with DB Team to verify that database tables, columns, and metadata are correctly inserted into Snowflake's development, stage, and production environments.
  • Ensure that artifacts such as mapping and data lineage documentation are updated.
  • Automating continuous code delivery(CI/CD) to various target environments by using Cloud Chef. It helps manage releases and promotions across environments.

Sr.Data Engineer

SJSU,Higher Education Vertical
07.2021 - 02.2023
  • Participated in requirement gathering meetings to better grasp the expectations and collaborated with system analysts to understand the structure and patterns of the upstream source data.
  • Built DataIngestion framework from scratch in GCP cloud services.
  • Involved with data ingestion, transformation, processing, and computation with GCP DataProc using PySpark.
  • Loaded processed data into BigQuery, enabling creation of dashboards for student enrollment and assessment.
  • Worked on messaging systems such as cloud pub/sub system and kafka.
  • ETL pipelines were tuned for better data processing efficiency by 30% through the use of join, bucketing, and partitioning algorithms.
  • Used Dataproc for transforming raw data into structured formats for storing data in BigQuery.
  • Built airflow pipelines for transferring data from on-premises Oracle databases to the GCP cloud- composer.
  • Worked on Environment Upgradation and Automation in GCP.
  • Worked on Dimensional Data modelling in Star and Snowflake schemas and Slowly Changing Dimensions
    (SCD).
  • Applied AI/ML models to forecast a student's likely academic success (e.x., course grades, GPA, graduation likelihood,likelihood of dropout).
  • Applied AI/ML models to made course recommendations by using data such as Student learning preferences, performance on assignments, engagement with specific learning modules, and content consumption patterns.

Data Engineer

Starbucks
11.2020 - 04.2021
  • Building Data lake(Azure data lake) for inventory management using Azure Databricks.
  • Contribute to agile development procedures, including sprint planning, code reviews, and continuous integration and deployment, to ensure high-quality data products for end users.
  • Developed a predictive model using Tableau and SQL to identify at-risk clients and implement proactive retention actions.
  • Data extraction and transformation operations were automated using SQL and ETL technologies, which helped reduce manual reporting time.
  • To maintain data correctness and dependability, monitor data pipelines and proactively resolve issues with data input, transformation and quality.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and
    aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Used principles of Normalization to improve the performance. Involved in ETL code using PL/SQL in
    order to meet requirements for Extract, transformation, cleansing and loading of data from source to
    target data structures.


DataEngineer

Thales Group
10.2018 - 10.2020
  • Assisting in the migration of historical data systems to a new data warehouse while maintaining data integrity and continuity.
  • Ad hoc data analysis requests from several departments were supported, resulting in usable insights within tight timeframes.
  • Conducted A/B testing tests to assess the efficacy of various techniques and give data-driven recommendations for optimization
  • Analysis of ticket booking for developing pricing mechanism.
  • Optimizing the price in real-time using predictive analysis techniques.

Software Engineer

Nielsen
09.2015 - 09.2018
  • Documenting data engineering processes, workflows, and systems to maintain a knowledge repository.
    Collaborating with team members and stakeholders to understand requirements and provide technical
    support.
  • Assisting in the design and implementation of data models and schemas for databases and data
    warehousing systems to ensure efficient data storage and retrieval.
  • Involved in writing SQL queries and stored procedures for the application in Oracle.
  • Keeps customers data up-to-date in real-time
  • Understand Consumers’ tastes and behavior well enough.
  • Showcase personalized products and services

SQL Developer

Mybank Sdn Bhd
12.2010 - 11.2012
  • Involved in writing SQL queries and stored procedures for the application in Oracle.
  • Designed and developed schema data models.
  • Documented business workflows for stakeholder review
  • Created new Tables, Sequences, Views, Procedure, Cursors and Triggers for database development.
  • Help the customer with level 4 support by analyzing the issue and assigning respective team to fix it.

Skills

  • Hadoop Eco System : Hive, Sqoop, HDFS, Yarn,Mesos
  • Spark : Spark Core, Spark SQL, Spark Streaming
  • Cloud : AWS,GCP, Azure and Databricks
  • AWS Specific tools: AWS-Glue, Redshift, Sagemaker, MWAA,IAM,Secret manager,Athena,S3,AWS EMR
  • GCP Specific tools: Cloud Storage,cloud fusion,cloud composer
  • Azure Specific Tools: ADF,ADLS, Blob Storage, SynapseDB, ActiveDirectory
  • AI: Machine learning,Deep learning Algorithms
  • Data warehouse: Snowflake,Delta lake
  • Devops: Jenkins,Ansible,Azure devops
  • Dashboard: Tableau,Qlik sense
  • Messaging System : Kafka,spark structured Streaming
  • ETL Tools : Azure ADF,Informatica
  • Programming Language : Python, C, C
  • Scripting : Python, shell,SQL

Best team member of the Year at Neiman marcus

Adopted and implemented advance tools and technologies for client requirements. 

Timeline

Sr. Data Engineer

Neiman Marcus Group Ltd
06.2023 - 04.2025

Sr.Data Engineer

SJSU,Higher Education Vertical
07.2021 - 02.2023

Data Engineer

Starbucks
11.2020 - 04.2021

DataEngineer

Thales Group
10.2018 - 10.2020

Software Engineer

Nielsen
09.2015 - 09.2018

SQL Developer

Mybank Sdn Bhd
12.2010 - 11.2012
Anvesh N