Summary
Overview
Work History
Education
Skills
Timeline
Generic

Mohan Vanukuri

Summary

PROFESSIONAL SUMMARY

  • Over 8 years of experience in Data Architecture, Design, Development, and Testing across OLTP and OLAP systems.
  • Expertise in Dimensional and Relational Data Modeling; skilled in managing Fact and Dimension tables.
  • Proficient in data modeling tools such as ERWin, Power Designer, and ER Studio for conceptual, logical, and physical modeling.
  • Extensive experience designing Star Schema, Snowflake Schema, and Operational Data Stores (ODS).
  • Strong hands-on knowledge of Big Data tools including Hadoop, Spark, Hive, Pig, Impala, PySpark, and Spark SQL.
  • Experienced with cloud platforms including AWS (Redshift, S3, EMR) and Microsoft Azure (Azure Data Lake).
  • Proficient in SQL and procedural languages across Oracle, Teradata, Netezza, and DB2.
  • Skilled in Data Quality, Scrubbing, Mapping, Profiling, and Validation within ETL workflows using tools like Informatica PowerCenter and SSIS.
  • Advanced knowledge of machine learning techniques such as Random Forest, SVM, Decision Trees, Logistic Regression, and Neural Networks.
  • Proficient with Python and R libraries (scikit-learn, ggplot2, dplyr, caret) for statistical analysis and model building.
  • Experienced in Data Visualization using Tableau; able to create dashboards and visual narratives from complex datasets.
  • Adept in Metadata documentation and performance tuning, including handling Slowly Changing Dimensions (SCDs).
  • Familiar with industry methodologies including SDLC, Agile, and Rational Unified Process (RUP).
  • Knowledgeable in data extraction from various formats and sources, including flat files, XML, Oracle, and IBM DB2.
  • Experienced in automation scripting using UNIX Shell and Perl for ETL operations.

Overview

9
9
years of professional experience

Work History

Senior Data Scientist

Markel
03.2024 - Current

Senior Data Scientist
Markel, Glen Allen, VAMar 2024 – Present

  • Designed and implemented an enterprise-grade data warehouse architecture, integrating data from multiple sources into an EDW using Spark, Hive, and Redshift.
  • Built scalable data pipelines for big data processing and machine learning use cases using Apache Spark, Python, and Hadoop ecosystem tools (HDFS, Hive, MapReduce).
  • Developed and deployed NLP models for sentiment analysis and implemented advanced ML algorithms (classification, regression, clustering, dimensionality reduction) using Spark MLlib and Python.
  • Improved user lifetime value by 45% and tripled conversions by applying predictive analytics and personalization techniques.
  • Migrated EDW to AWS cloud (EMR, S3, Redshift), optimizing performance and cost-efficiency; configured platform architecture for scalable ML workloads.
  • Managed data quality and consistency through SQL- and Hive-based validation scripts, Informatica MDM Hub, and STAR schema modeling for Tableau dashboards.
  • Leveraged Teradata utilities (BTEQ, Fast Load, Tpump) and SQL across Redshift, Oracle, PostgreSQL, and MySQL for high-volume data ingestion and transformation.
  • Created interactive visualizations using Python and Tableau, delivering actionable insights to business stakeholders.


Data Scientist

Vertex
01.2022 - 03.2024

Data Scientist
Vertex, Palo Alto, CAJan 2022 – Mar 2024

  • Developed enterprise data models, data dictionaries, and metadata repositories to support master data management and data governance initiatives.
  • Built and optimized large-scale data pipelines using Python, SQL, Spark, NiFi, and Kafka for real-time and batch processing in distributed environments.
  • Applied advanced ML and deep learning algorithms (XGBoost, TensorFlow, PyTorch) for predictive analytics and risk modeling; conducted EDA and hypothesis testing for business strategies.
  • Designed and deployed STAR schema models in Amazon Redshift; developed Tableau dashboards for actionable business intelligence.
  • Led cloud data infrastructure migrations to AWS and Azure (S3, EMR, Redshift, Azure Data Lake), enhancing scalability and storage efficiency.
  • Integrated NLP tools (NLTK, Stanford NLP) for text mining and unstructured data analysis; implemented clustering and optimization techniques on SQL platforms.
  • Optimized query performance in PostgreSQL, MongoDB, and Cassandra; used Databricks to orchestrate ML workflows and streamline ETL processes.
  • Conducted training sessions and workshops to upskill teams in modern data science practices, tools, and technologies.

Data Scientist

Finish line
10.2019 - 12.2021

Data Scientist
Finish Line, Indianapolis, INOct 2019 – Dec 2021

  • Built and deployed machine learning models (XGBoost, Random Forest, Neural Networks) using Python, TensorFlow, and PyTorch to solve complex business problems and enhance predictive accuracy.
  • Developed real-time and batch data pipelines using Apache Spark and Kafka; automated data ingestion and transformation workflows with Python and ETL tools.
  • Leveraged AWS and Azure platforms for scalable ML model deployment and managed large datasets across cloud infrastructure.
  • Conducted data analysis across SQL and NoSQL systems (MongoDB, Cassandra) to support business intelligence and data strategy initiatives.
  • Designed and visualized insights through interactive dashboards using Tableau and Power BI; led A/B testing and statistical experiments for model validation.
  • Ensured data integrity through governance and quality controls using Alteryx and Talend; implemented best practices for data standardization.
  • Managed and mentored junior data professionals, while driving knowledge sharing through documentation and collaborative work in Databricks.
  • Supported strategic initiatives by aligning data architecture with organizational goals and recommending improvements based on root cause analysis.

Data Analyst/Data Modeler

Excellent WebWorld
11.2016 - 07.2019

Data Analyst / Data Modeler
Excellent WebWorld, Ahmedabad, IndiaNov 2016 – Jul 2019

  • Developed and implemented data mapping, transformation, and cleansing rules for Master Data Management (MDM) across OLTP and ODS environments.
  • Designed conceptual, logical, and physical data models using ERWin; enforced referential integrity and collaborated with application and modeling teams for validation.
  • Created OLAP metadata catalog tables and performed forward engineering for database schema generation; handled SQL-based data extraction and analysis tasks.
  • Identified slowly changing dimensions and designed dimensional hierarchies to support enterprise reporting and analytics.
  • Automated reporting using Teradata and ODBC connectivity to MS Excel; executed departmental performance reports using Oracle SQL and Excel.
  • Conducted ad hoc analysis using BTEQ and Teradata SQL scripts; ensured efficient data exchange between flat files and databases.

Education

Bachelor of Technology (B.Tech) - Computer Science Engineering

KL University
01-2017

Skills

    Programming & Data Science Languages:
    Python, R, SQL, Scala, Java

    Python Libraries & Machine Learning Frameworks:
    Pandas, NumPy, SciPy, Matplotlib, Seaborn, Plotly, Scikit-Learn, TensorFlow, PyTorch, Keras, XGBoost, LightGBM, H2Oai

    Data Engineering & Big Data Tools:
    Apache Spark, Apache Hadoop, Apache Kafka, Apache Hive, Apache Flink, Databricks, Apache Pig, MapReduce, Apache NiFi, Apache Airflow, Presto

    ETL & Data Integration:
    Talend, Informatica PowerCenter, AWS Glue

    Databases & Data Warehousing:
    Amazon Redshift, Snowflake, Google BigQuery, Teradata, Oracle, PostgreSQL, MySQL, Microsoft SQL Server, MongoDB, Apache Cassandra

    Cloud Platforms:
    AWS (S3, EC2, EMR, Redshift), Microsoft Azure (Azure Data Lake, Synapse)

    Data Visualization:
    Tableau, Power BI, ggplot2, Plotly, Matplotlib, Seaborn

    Model Development & Analytics:
    Feature Engineering, Machine Learning, Natural Language Processing (NLP), Model Validation

    DevOps & Workflow Tools:
    Docker, Kubernetes, Git, Jenkins, Terraform, Ansible

    Collaboration & Experimentation Platforms:
    Jupyter Notebook, Google Colab, Apache Zeppelin

    Project Methodologies:
    Agile, Scrum, Kanban, Waterfall

Timeline

Senior Data Scientist

Markel
03.2024 - Current

Data Scientist

Vertex
01.2022 - 03.2024

Data Scientist

Finish line
10.2019 - 12.2021

Data Analyst/Data Modeler

Excellent WebWorld
11.2016 - 07.2019

Bachelor of Technology (B.Tech) - Computer Science Engineering

KL University
Mohan Vanukuri