Summary

Overview

Work History

Education

Skills

Timeline

Vishal Ambhore

Frisco,TX

Summary

Over10+ years of work experience in IT field, involved in all phases of software development lifecycle while working in different projects. Very strong experience in processing, analyzing large sets of structured, semi-structured and unstructured data and supporting systems application architecture. Extensive experience in writing Hadoop jobs for data analysis as per the business requirements using Hive and Pig. Expertise in creating Hive Internal/External Tables/Views using shared Meta store. Having Good knowledge of Pyspark and Scala. . Developed the Code using SCALA and pyspark. Having knowledge about serverless technique like AWS Lambda, Athena,AWS Batch,S3,EMR. Having Support Experience in Production Side. Have experience of working on Snowflake and Vertica data warehouse. Worked extensively on SQOOP to import and export data from RDBMS to HDFS and vice-versa. Proficient in big data ingestion and streaming tools like Sqoop, Kafka and Spark. Experience of working on data formats like Avro, Parquet. Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement. Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka, spark streaming and Apache Storm. Worked on Oozie to manage and schedule the jobs on Hadoop cluster. Done data Migration from abinitio,Informatica. Having the knowledge of PL/SQL queries. Knowledge of developing analytical components using Scala. Experience in managing and reviewing Hadoop log files. Worked with NoSQL database HBase to create tables and store data. Experience in setting up Hive, Pig, HBase, and SQOOP on Ubuntu Operating system. Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle10g, MySQL, MS SQL Server & PL/SQL,Netezza. Proficient in using data visualization tools like Tableau, Raw and MS Excel. Experience in developing web interfaces using technologies like XML, HTML, DHTML and CSS. Implemented functions, stored procedures, triggers using PL/SQL. Good understanding of ETL processes and Data warehousing. Strong experience in writing UNIX shell scripts. Working in different projects provided exposure and good understanding of different phases in SDLC. Deploying the code via Jenkins/Bitbucket. Merging the code to master after validating to Bitbucket (similar to GIT).

Overview

years of professional experience

Work History

Big Data Hadoop Developer

Mastercard

09.2023 - Current

As a part of the data and services team we are building the solutions for moving the data from SQL/Oracle server to Impala DB
Currently we have Data in TAO database which will do the query for Calcand datausage and it will show the result in Aidweb
In the current architecture we are having Aidcalc is query generator, which is generating the huge, massive SQL query
From user Perspective user will come to MasterCard Intelligence dashboard and will see the current transaction, charts etc
We can use Sqoop to move the data from SQL server/Oracle/MySQL/DB2 to Hadoop ecosystem
Using the CI/CD approach for the code Deployment
Using Agile Board for Daily work
We are having the daily standup, weekly goals, feature sharing session
Moving Data from Postgres to S3 Bucket using Spark job
Moving Data from S3 to Teradata using Airflow
Used AIRFLOW for the Creating Data Pipeline
Used the EMR jobs Data Validation in AWS bucket
Worked on AWS EMR,Batch,S3 and Airflow for Day-to-Day Development activity
Did the data validation in production using the Impala
Worked on Hive on Spark for getting the data from hive and build RDD on top it
Create the CA7 scheduling jobs for the mainframe systems
Created the oozie workflows for managing the hive, spark jobs
Managed the Knowledge transfer session for new joiner in team
Developed the PL/SQL Queries for sources systems
Worked on PL/SQL queries for complex applications
Fixed the issues for customer in case of emergency
Did coding in Scala for data cleaning, data preparation
Ran the jobs in EMR for Testing purpose
Developed multiple Scala jobs for data cleaning
Created Hive tables and worked on them using Hive QL
Managing the support for Production side(L1 Support)
Handling Production Job Support in Production environment
Handling the data support for both dev and production data
Consumed the data from Abinitio/Informatica application
Environment: Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, Scala, Spark, SQOOP, IMPALA

Lead Big Data Developer

Truist Bank

05.2020 - 09.2023

Experienced in development using Cloudera distribution system
As a Hadoop Developer, my responsibility is manage the data pipelines and data lake
Developed Pyspark code using Spark-Sql for faster processing of data
Implemented Spark using Pyspark and utilizing the dataframe and spark sql api for faster processing of data
Implemented Spark using Pyspark and spark-sql for faster testing and processing of data
Migrated data from Mainframe system USING SYNCSORT to Hadoop ecosystem
Build the container and tenants as per request from customer
Build the Hadoop tables on top of the container
Got the requirement from the business for data transformation, data preparation, and data cleaning logic
Developed the logic for creating same data using Scala
Manage the code in VSTS
Did the data validation in production using the Impala
Worked on Hive on spark for getting the data from hive and build RDD on top it
Create the CA7 scheduling jobs for the mainframe systems
Created the oozie workflows for managing the hive, spark jobs
Managed the Knowledge transfer session for new joiner in team
Fixed the issues for customer in case of emergency
Developed multiple Spark Sql jobs for data cleaning
Created Hive tables and worked on them using Hive QL
Consumed the real time data using Kafka and spark streaming
Managing the support for Production side(L1 Support)
Having good knowledge of Kafka Architecture
Handling Production Job Support in Production environment
Handling the data support for both dev and production data
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
Involved in the identifying, analyzing defects, questionable function error and inconsistencies observed in the output
Worked on PL/SQL Queries
Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
Assisted in loading large sets of data (Structure) to HDFS
Environment: Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, Scala, Spark, SQOOP, IMPALA

Hadoop Developer

Equitable-Insurance

01.2019 - 05.2020

Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
Designed a custom Spark REPL application to handle similar datasets
Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
Performed Hive test queries on local sample files and HDFS files
Used AWS services like EC2 and S3 for small data sets
Developed the application on Eclipse IDE
Developed Hive queries to analyze data and generate results
Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing
Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Spark, and Sqoop
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization, and user report generation
Make hive connection with Informatica engine
Consume Abinitio Data in Hadoop Ecosystems
Used Scala to write code for all Spark use cases
Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
Assigned name to each of the columns using case class option in Scala
Developed multiple Spark SQL jobs for data cleaning
Created Hive tables and worked on them using Hive QL
Assisted in loading large sets of data (Structured, Semi Structured, and Unstructured) to HDFS
Environment: Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, Scala, Spark, SQOOP, IMPALA
Collaborated with cross-functional teams for seamless integration of Hadoop components into existing enterprise infrastructure.

Hadoop Developer

Synchrony Bank

09.2017 - 12.2018

Experienced in development using the Cloudera distribution system
As a Hadoop Developer, my responsibility is to manage the data pipelines and data lake
Have experience working on the Snowflake data warehouse
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
Designed a custom Spark REPL application to handle similar datasets
Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
Performed Hive test queries on local sample files and HDFS files
Used AWS services like EC2 and S3 for small data sets
Developed the application on Eclipse IDE
Developed Hive queries to analyze data and generate results
Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing
Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Spark, and Sqoop
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation
Developed the PL/SQL queries
Involved in developing complex PL/SQL queries
Used Scala to write code for all Spark use cases
Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
Assigned name to each of the columns using the case class option in Scala
Developed multiple Spark SQL jobs for data cleaning
Created Hive tables and worked on them using Hive QL
Assisted in loading large sets of data (Structured, Semi Structured, and Unstructured) to HDFS
Developed Spark SQL to load tables into HDFS to run select queries on top
Developed analytical components using Scala, Spark, and Spark Stream
Used Visualization tools such as Power View for Excel, tableau for visualizing and generating reports
Worked on the NoSQL databases HBase and Mongo DB
Read data from different topics in Kafka
Moved data from s3 bucket to Snowflake data warehouse for generating the reports
Written Hive queries for data analysis to meet the business requirements
Migrated an existing on-premises application to AWS
Environment: Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, Scala, Spark, SQOOP, IMPALA
Delivered accurate reports through acomprehensive analysis of complex datasets using Sqoop for importing/exporting data between relational databases and Hadoop clusters.
Verified new product development effort alignment with supportability goals and proposed Service Level Agreement (SLA) parameters.

Spark Developer

Capital One

01.2017 - 09.2017

Experienced in development using the Cloudera distribution system
As a Hadoop Developer, my responsibility is to manage the data pipelines and data lake
Performing Hadoop ETL using hive on data at different stages of the pipeline
Worked in an agile technology with Scrum
Sqoop’ed data from different source systems and automated them with oozie workflows
Generation of business reports from Data Lake using Hadoop SQL (Impala) as per the Business Needs
Automation of Business reports using Bash scripts in UNIX on Data Lake by sending them to business owners
Developed Spark scala code to cleanse and perform ETL on the data in the data pipeline in different stages
Worked in different environments like DEV, QA, Data Lake, and Analytics Cluster as part of Hadoop Development
Snapped the cleansed data to the Analytics Cluster for reporting purposes to the business
Developed pig scripts and Python to perform Streaming and created tables on top of it using hive
Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, and SQL
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data
Developed Oozie workflow engine to run multiple Hive, Pig, sqoop, and Spark jobs
Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS
Developed pig, hive, sqoop, Hadoop streaming, and spark actions in Oozie in the workflow management
Supported Map Reduce Programs that are running on the cluster
Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume and Kafka
Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications
Good Understanding of Workflow management process and implementation
Environment: Linux, Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP
Simplified complex business logic by designing modularized code structures within a functional programming paradigm using Scala or Python alongside Apache Spark libraries.
Collaborated with cross-functional teams to define requirements, design solutions, and ensure successful project delivery utilizing the Apache Spark technology stack.
Ensured the reliability of data pipelines by implementing robust error handling mechanisms within Spark applications.

Hadoop Developer

AbbVie

10.2013 - 09.2015

Responsible for managing data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
Developed MapReduce programs to parse the raw data and store the refined data in tables
Designed and modified database tables and used HBase queries to insert and fetch data from tables
Involved in moving all log files generated from various sources to HDFS for further processing through Flume
Involved in loading and transforming large sets of structured, semi-structured, and unstructured data from relational databases into HDFS using Sqoop imports
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data
Extensively wrote SQL Queries (Sub queries, correlated subqueries, and Join conditions) for Data Accuracy, Data Analysis, and Data Extraction needs
Worked on Data mapping, and logical data modeling using SQL queries to filter data within the Oracle database tables
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS that were further used for analysis
Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications
Created Hive tables, loaded data, and wrote Hive queries that run within the map
Used Oozie operational services for batch processing and scheduling workflows dynamically
Developed and updated social media analytics dashboards regularly
Performed data mining investigations to find new insights related to customers
Managed read viewed Hadoop log files
Used Vertica as Enterprise data warehouse
Analyzed the web log data using HiveQL to extract the number of unique visitors per day
Developed and generated insights based on brand conversations, which in turn were helpful for effectively driving brand awareness, engagement, and traffic to social media pages
Involved in the identification of topics and trends and building context around that brand
Involved in identifying, and analyzing defects, questionable function errors, and inconsistencies observed in the output
Environment: HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume1.3, Oozie, Zookeeper, MySQL, and Eclipse
Facilitated easier querying of semi-structured JSON/XML documents using NoSQL databases like MongoDB or Apache Cassandra as complementary storage options alongside HDFS.
Authored documentation for data dictionaries, business rules, and intake parameters and presented collected information to decision-makers.
Migrated legacy ETL processes to a more scalable solution using Spark, reducing processing times.

Java Developer

DB-Xento

10.2010 - 09.2013

Used XML for ORM mapping relations with the Java classes and the database
Worked in Analysis, Design, and Coding for client development using J2EE stack using theEclipse platform
Involved in creating web-based Java components like client Applets and client-side UI using JFC in Eclipse
Developed PL/SQL stored procedures to perform complex database operations
Used Struts in the presentation tier
Used Subversion as the version control system
Played a key role in the design and development of applications using J2EE, Struts, Spring
Involved in various phases of the software Development Life Cycle
Configured Struts framework to implement MVC design patterns
Designed and developed GUI using JSP, HTML, DHTML, and CSS
Generated the Hibernate XML and Java Mappings for the schemas
Used Rational Application Developer (RAD) as Integrated Development Environment (IDE)
Extensively used Core Java, Servlets, JSP, and XML
Used Oracle WebLogic workshop to generate the web service artifacts from the given WSDL for JAX-WS specification
Environment: Java, Struts, Servlets, Spring, Tomcat, Hibernate, HTML, JSP, XML, SQL, J2EE, Junit, Oracle11g, Windows
Reduced software bugs by conducting thorough unit testing and collaborating with QA teams.
Reviewed code and debugged errors to improve performance.
Enhanced application performance by optimizing Java code and implementing efficient algorithms.
Troubleshoot complex issues within existing software applications, identifying root causes and implementing effective solutions.
Streamlined development processes by employing Agile methodologies and participating in Scrum meetings.
Contributed to the successful completion of projects by meeting tight deadlines and delivering high-quality code.

Education

Master of Science - Computer Science

San Francisco Bay University

Fremont, CA

12-2016

Bachelor of Engineering -

Pune University

Pune India

08.2007

Skills

Hadoop/Big Data: Hadoop1x/2x(Yarn), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Spark, Storm, Zookeeper, Scala, Oozie, Ambari, Tez, R
Development Tools: Eclipse,IntelliJ IBM DB2 Command Editor, TOAD, SQL Developer, VM Ware
Programming/Scripting Languages: Java, C, Unix Shell Scripting, Python, SQL, Pig Latin, Hive QL
Databases: Oracle11g,10g,9i, MySQL, SQL Server2005,2008, PostgreSQL& DB2
NoSQL Databases: HBase, Cassandra, Mongo DB
ETL: Informatica
Visualization: Tableau, Raw and MS Excel
Frameworks: Hibernate, JSF20, Spring
Version Control Tools: Sub Version (SVN), Concurrent Versions System (CVS), and IBM Rational Clear Case

Methodologies: Agile/ Scrum, Waterfall
Operating Systems: Windows, Unix, Linux and Solaris
API integration
Parquet file format
Kafka messaging system
Zookeeper coordination
Java big data development
Amazon EMR deployment
Hadoop ecosystem expertise

Timeline

Big Data Hadoop Developer

Mastercard

09.2023 - Current

Lead Big Data Developer

Truist Bank

05.2020 - 09.2023

Hadoop Developer

Equitable-Insurance

01.2019 - 05.2020

Hadoop Developer

Synchrony Bank

09.2017 - 12.2018

Spark Developer

Capital One

01.2017 - 09.2017

Hadoop Developer

AbbVie

10.2013 - 09.2015

Java Developer

DB-Xento

10.2010 - 09.2013

Master of Science - Computer Science

San Francisco Bay University

Bachelor of Engineering -

Pune University

Vishal Ambhore

Summary

Overview

Work History

Big Data Hadoop Developer

Lead Big Data Developer

Hadoop Developer

Hadoop Developer

Spark Developer

Hadoop Developer

Java Developer

Education

Master of Science - Computer Science

Bachelor of Engineering -

Skills

Timeline

Big Data Hadoop Developer

Lead Big Data Developer

Hadoop Developer

Hadoop Developer

Spark Developer

Hadoop Developer

Java Developer

Master of Science - Computer Science

Bachelor of Engineering -

Similar Profiles

Shayan MoazzamShayan Moazzam

Summer CaldwellSummer Caldwell

Madhura KhedekarMadhura Khedekar

Neha SharmaNeha Sharma

Ethan RhaburnEthan Rhaburn