Summary
Overview
Work History
Education
Skills
Timeline
Generic

Vishal Ambhore

Frisco,TX

Summary

Over10+ years of work experience in IT field, involved in all phases of software development lifecycle while working in different projects. Very strong experience in processing, analyzing large sets of structured, semi-structured and unstructured data and supporting systems application architecture. Extensive experience in writing Hadoop jobs for data analysis as per the business requirements using Hive and Pig. Expertise in creating Hive Internal/External Tables/Views using shared Meta store. Having Good knowledge of Pyspark and Scala. . Developed the Code using SCALA and pyspark. Having knowledge about serverless technique like AWS Lambda, Athena,AWS Batch,S3,EMR. Having Support Experience in Production Side. Have experience of working on Snowflake and Vertica data warehouse. Worked extensively on SQOOP to import and export data from RDBMS to HDFS and vice-versa. Proficient in big data ingestion and streaming tools like Sqoop, Kafka and Spark. Experience of working on data formats like Avro, Parquet. Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement. Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka, spark streaming and Apache Storm. Worked on Oozie to manage and schedule the jobs on Hadoop cluster. Done data Migration from abinitio,Informatica. Having the knowledge of PL/SQL queries. Knowledge of developing analytical components using Scala. Experience in managing and reviewing Hadoop log files. Worked with NoSQL database HBase to create tables and store data. Experience in setting up Hive, Pig, HBase, and SQOOP on Ubuntu Operating system. Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle10g, MySQL, MS SQL Server & PL/SQL,Netezza. Proficient in using data visualization tools like Tableau, Raw and MS Excel. Experience in developing web interfaces using technologies like XML, HTML, DHTML and CSS. Implemented functions, stored procedures, triggers using PL/SQL. Good understanding of ETL processes and Data warehousing. Strong experience in writing UNIX shell scripts. Working in different projects provided exposure and good understanding of different phases in SDLC. Deploying the code via Jenkins/Bitbucket. Merging the code to master after validating to Bitbucket (similar to GIT).

Overview

14
14
years of professional experience

Work History

Big Data Hadoop Developer

Mastercard
09.2023 - Current
  • As a part of the data and services team we are building the solutions for moving the data from SQL/Oracle server to Impala DB
  • Currently we have Data in TAO database which will do the query for Calcand datausage and it will show the result in Aidweb
  • In the current architecture we are having Aidcalc is query generator, which is generating the huge, massive SQL query
  • From user Perspective user will come to MasterCard Intelligence dashboard and will see the current transaction, charts etc
  • We can use Sqoop to move the data from SQL server/Oracle/MySQL/DB2 to Hadoop ecosystem
  • Using the CI/CD approach for the code Deployment
  • Using Agile Board for Daily work
  • We are having the daily standup, weekly goals, feature sharing session
  • Moving Data from Postgres to S3 Bucket using Spark job
  • Moving Data from S3 to Teradata using Airflow
  • Used AIRFLOW for the Creating Data Pipeline
  • Used the EMR jobs Data Validation in AWS bucket
  • Worked on AWS EMR,Batch,S3 and Airflow for Day-to-Day Development activity
  • Did the data validation in production using the Impala
  • Worked on Hive on Spark for getting the data from hive and build RDD on top it
  • Create the CA7 scheduling jobs for the mainframe systems
  • Created the oozie workflows for managing the hive, spark jobs
  • Managed the Knowledge transfer session for new joiner in team
  • Developed the PL/SQL Queries for sources systems
  • Worked on PL/SQL queries for complex applications
  • Fixed the issues for customer in case of emergency
  • Did coding in Scala for data cleaning, data preparation
  • Ran the jobs in EMR for Testing purpose
  • Developed multiple Scala jobs for data cleaning
  • Created Hive tables and worked on them using Hive QL
  • Managing the support for Production side(L1 Support)
  • Handling Production Job Support in Production environment
  • Handling the data support for both dev and production data
  • Consumed the data from Abinitio/Informatica application
  • Environment: Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, Scala, Spark, SQOOP, IMPALA

Lead Big Data Developer

Truist Bank
05.2020 - 09.2023
  • Experienced in development using Cloudera distribution system
  • As a Hadoop Developer, my responsibility is manage the data pipelines and data lake
  • Developed Pyspark code using Spark-Sql for faster processing of data
  • Implemented Spark using Pyspark and utilizing the dataframe and spark sql api for faster processing of data
  • Implemented Spark using Pyspark and spark-sql for faster testing and processing of data
  • Migrated data from Mainframe system USING SYNCSORT to Hadoop ecosystem
  • Build the container and tenants as per request from customer
  • Build the Hadoop tables on top of the container
  • Got the requirement from the business for data transformation, data preparation, and data cleaning logic
  • Developed the logic for creating same data using Scala
  • Manage the code in VSTS
  • Did the data validation in production using the Impala
  • Worked on Hive on spark for getting the data from hive and build RDD on top it
  • Create the CA7 scheduling jobs for the mainframe systems
  • Created the oozie workflows for managing the hive, spark jobs
  • Managed the Knowledge transfer session for new joiner in team
  • Fixed the issues for customer in case of emergency
  • Developed multiple Spark Sql jobs for data cleaning
  • Created Hive tables and worked on them using Hive QL
  • Consumed the real time data using Kafka and spark streaming
  • Managing the support for Production side(L1 Support)
  • Having good knowledge of Kafka Architecture
  • Handling Production Job Support in Production environment
  • Handling the data support for both dev and production data
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies observed in the output
  • Worked on PL/SQL Queries
  • Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
  • Assisted in loading large sets of data (Structure) to HDFS
  • Environment: Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, Scala, Spark, SQOOP, IMPALA

Hadoop Developer

Equitable-Insurance
01.2019 - 05.2020
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Designed a custom Spark REPL application to handle similar datasets
  • Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
  • Performed Hive test queries on local sample files and HDFS files
  • Used AWS services like EC2 and S3 for small data sets
  • Developed the application on Eclipse IDE
  • Developed Hive queries to analyze data and generate results
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Spark, and Sqoop
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization, and user report generation
  • Make hive connection with Informatica engine
  • Consume Abinitio Data in Hadoop Ecosystems
  • Used Scala to write code for all Spark use cases
  • Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
  • Assigned name to each of the columns using case class option in Scala
  • Developed multiple Spark SQL jobs for data cleaning
  • Created Hive tables and worked on them using Hive QL
  • Assisted in loading large sets of data (Structured, Semi Structured, and Unstructured) to HDFS
  • Environment: Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, Scala, Spark, SQOOP, IMPALA
  • Collaborated with cross-functional teams for seamless integration of Hadoop components into existing enterprise infrastructure.

Hadoop Developer

Synchrony Bank
09.2017 - 12.2018
  • Experienced in development using the Cloudera distribution system
  • As a Hadoop Developer, my responsibility is to manage the data pipelines and data lake
  • Have experience working on the Snowflake data warehouse
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Designed a custom Spark REPL application to handle similar datasets
  • Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
  • Performed Hive test queries on local sample files and HDFS files
  • Used AWS services like EC2 and S3 for small data sets
  • Developed the application on Eclipse IDE
  • Developed Hive queries to analyze data and generate results
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Spark, and Sqoop
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation
  • Developed the PL/SQL queries
  • Involved in developing complex PL/SQL queries
  • Used Scala to write code for all Spark use cases
  • Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
  • Assigned name to each of the columns using the case class option in Scala
  • Developed multiple Spark SQL jobs for data cleaning
  • Created Hive tables and worked on them using Hive QL
  • Assisted in loading large sets of data (Structured, Semi Structured, and Unstructured) to HDFS
  • Developed Spark SQL to load tables into HDFS to run select queries on top
  • Developed analytical components using Scala, Spark, and Spark Stream
  • Used Visualization tools such as Power View for Excel, tableau for visualizing and generating reports
  • Worked on the NoSQL databases HBase and Mongo DB
  • Read data from different topics in Kafka
  • Moved data from s3 bucket to Snowflake data warehouse for generating the reports
  • Written Hive queries for data analysis to meet the business requirements
  • Migrated an existing on-premises application to AWS
  • Environment: Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, Scala, Spark, SQOOP, IMPALA
  • Delivered accurate reports through acomprehensive analysis of complex datasets using Sqoop for importing/exporting data between relational databases and Hadoop clusters.
  • Verified new product development effort alignment with supportability goals and proposed Service Level Agreement (SLA) parameters.

Spark Developer

Capital One
01.2017 - 09.2017
  • Experienced in development using the Cloudera distribution system
  • As a Hadoop Developer, my responsibility is to manage the data pipelines and data lake
  • Performing Hadoop ETL using hive on data at different stages of the pipeline
  • Worked in an agile technology with Scrum
  • Sqoop’ed data from different source systems and automated them with oozie workflows
  • Generation of business reports from Data Lake using Hadoop SQL (Impala) as per the Business Needs
  • Automation of Business reports using Bash scripts in UNIX on Data Lake by sending them to business owners
  • Developed Spark scala code to cleanse and perform ETL on the data in the data pipeline in different stages
  • Worked in different environments like DEV, QA, Data Lake, and Analytics Cluster as part of Hadoop Development
  • Snapped the cleansed data to the Analytics Cluster for reporting purposes to the business
  • Developed pig scripts and Python to perform Streaming and created tables on top of it using hive
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, and SQL
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data
  • Developed Oozie workflow engine to run multiple Hive, Pig, sqoop, and Spark jobs
  • Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS
  • Developed pig, hive, sqoop, Hadoop streaming, and spark actions in Oozie in the workflow management
  • Supported Map Reduce Programs that are running on the cluster
  • Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume and Kafka
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications
  • Good Understanding of Workflow management process and implementation
  • Environment: Linux, Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP
  • Simplified complex business logic by designing modularized code structures within a functional programming paradigm using Scala or Python alongside Apache Spark libraries.
  • Collaborated with cross-functional teams to define requirements, design solutions, and ensure successful project delivery utilizing the Apache Spark technology stack.
  • Ensured the reliability of data pipelines by implementing robust error handling mechanisms within Spark applications.

Hadoop Developer

AbbVie
10.2013 - 09.2015
  • Responsible for managing data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
  • Developed MapReduce programs to parse the raw data and store the refined data in tables
  • Designed and modified database tables and used HBase queries to insert and fetch data from tables
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume
  • Involved in loading and transforming large sets of structured, semi-structured, and unstructured data from relational databases into HDFS using Sqoop imports
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data
  • Extensively wrote SQL Queries (Sub queries, correlated subqueries, and Join conditions) for Data Accuracy, Data Analysis, and Data Extraction needs
  • Worked on Data mapping, and logical data modeling using SQL queries to filter data within the Oracle database tables
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS that were further used for analysis
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications
  • Created Hive tables, loaded data, and wrote Hive queries that run within the map
  • Used Oozie operational services for batch processing and scheduling workflows dynamically
  • Developed and updated social media analytics dashboards regularly
  • Performed data mining investigations to find new insights related to customers
  • Managed read viewed Hadoop log files
  • Used Vertica as Enterprise data warehouse
  • Analyzed the web log data using HiveQL to extract the number of unique visitors per day
  • Developed and generated insights based on brand conversations, which in turn were helpful for effectively driving brand awareness, engagement, and traffic to social media pages
  • Involved in the identification of topics and trends and building context around that brand
  • Involved in identifying, and analyzing defects, questionable function errors, and inconsistencies observed in the output
  • Environment: HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume1.3, Oozie, Zookeeper, MySQL, and Eclipse
  • Facilitated easier querying of semi-structured JSON/XML documents using NoSQL databases like MongoDB or Apache Cassandra as complementary storage options alongside HDFS.
  • Authored documentation for data dictionaries, business rules, and intake parameters and presented collected information to decision-makers.
  • Migrated legacy ETL processes to a more scalable solution using Spark, reducing processing times.

Java Developer

DB-Xento
10.2010 - 09.2013
  • Used XML for ORM mapping relations with the Java classes and the database
  • Worked in Analysis, Design, and Coding for client development using J2EE stack using theEclipse platform
  • Involved in creating web-based Java components like client Applets and client-side UI using JFC in Eclipse
  • Developed PL/SQL stored procedures to perform complex database operations
  • Used Struts in the presentation tier
  • Used Subversion as the version control system
  • Played a key role in the design and development of applications using J2EE, Struts, Spring
  • Involved in various phases of the software Development Life Cycle
  • Configured Struts framework to implement MVC design patterns
  • Designed and developed GUI using JSP, HTML, DHTML, and CSS
  • Generated the Hibernate XML and Java Mappings for the schemas
  • Used Rational Application Developer (RAD) as Integrated Development Environment (IDE)
  • Extensively used Core Java, Servlets, JSP, and XML
  • Used Oracle WebLogic workshop to generate the web service artifacts from the given WSDL for JAX-WS specification
  • Environment: Java, Struts, Servlets, Spring, Tomcat, Hibernate, HTML, JSP, XML, SQL, J2EE, Junit, Oracle11g, Windows
  • Reduced software bugs by conducting thorough unit testing and collaborating with QA teams.
  • Reviewed code and debugged errors to improve performance.
  • Enhanced application performance by optimizing Java code and implementing efficient algorithms.
  • Troubleshoot complex issues within existing software applications, identifying root causes and implementing effective solutions.
  • Streamlined development processes by employing Agile methodologies and participating in Scrum meetings.
  • Contributed to the successful completion of projects by meeting tight deadlines and delivering high-quality code.

Education

Master of Science - Computer Science

San Francisco Bay University
Fremont, CA
12-2016

Bachelor of Engineering -

Pune University
Pune India
08.2007

Skills

  • Hadoop/Big Data: Hadoop1x/2x(Yarn), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Spark, Storm, Zookeeper, Scala, Oozie, Ambari, Tez, R
  • Development Tools: Eclipse,IntelliJ IBM DB2 Command Editor, TOAD, SQL Developer, VM Ware
  • Programming/Scripting Languages: Java, C, Unix Shell Scripting, Python, SQL, Pig Latin, Hive QL
  • Databases: Oracle11g,10g,9i, MySQL, SQL Server2005,2008, PostgreSQL& DB2
  • NoSQL Databases: HBase, Cassandra, Mongo DB
  • ETL: Informatica
  • Visualization: Tableau, Raw and MS Excel
  • Frameworks: Hibernate, JSF20, Spring
  • Version Control Tools: Sub Version (SVN), Concurrent Versions System (CVS), and IBM Rational Clear Case
  • Methodologies: Agile/ Scrum, Waterfall
  • Operating Systems: Windows, Unix, Linux and Solaris
  • API integration
  • Parquet file format
  • Kafka messaging system
  • Zookeeper coordination
  • Java big data development
  • Amazon EMR deployment
  • Hadoop ecosystem expertise

Timeline

Big Data Hadoop Developer

Mastercard
09.2023 - Current

Lead Big Data Developer

Truist Bank
05.2020 - 09.2023

Hadoop Developer

Equitable-Insurance
01.2019 - 05.2020

Hadoop Developer

Synchrony Bank
09.2017 - 12.2018

Spark Developer

Capital One
01.2017 - 09.2017

Hadoop Developer

AbbVie
10.2013 - 09.2015

Java Developer

DB-Xento
10.2010 - 09.2013

Master of Science - Computer Science

San Francisco Bay University

Bachelor of Engineering -

Pune University
Vishal Ambhore