Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

SHIVAM BHATIA

Summary

Absolute Senior IT- Experienced, adept in designing, implementing and maintaining solutions on Big Data Ecosystem. Deep understanding of Hadoop architecture, Spark execution engine, Hive data warehousing and NO-SQL databases. Extensively worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics. Developed re-usable and configurable components as part of project requirements in Scala and Python. Good knowledge of Scala's functional style programming techniques like Anonymous Functions (Closures), Currying, Higher Order Functions and Pattern Matching. Experience using Sqoop to import and export data into HDFS from RDBMS and vice-versa. Good knowledge of No-SQL databases like Cassandra, MongoDB and HBase. Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions. Experienced in writing Map Reduce programs using Java to perform transformations on different data sets using Mapper and Reducer tasks. Good in using version control like GITHUB and SVN and hands-on building tool Maven and continuous integration like Jenkins. Strong debugging and critical thinking ability with good understanding of frameworks advancement in methodologies and strategies. Efficient Cloud Engineer with years of experience assembling cloud infrastructure. Utilizes strong managerial skills by negotiating with vendors and coordinating tasks with other IT team members. Implements best practices to create cloud functions like AWS, GCP applications and databases.

Overview

9
9
years of professional experience

Work History

Big Data Developer

Bank of America Corporate
Newark, DE
10.2021 - 12.2023
  • Configured different topologies for spark clusters and deployed them on a regular basis
  • Worked on Spark SQL, Reading/Writing data from JSON file, text file, parquet file, Schema RDD
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs
  • Experience in working with SQL, HiveQl, SparkSQL and shell scripts, views, indexes, stored procedures, and other components of database applications
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
  • Coordinated the resources by working closely with Project Managers for the release and carried deployments and builds on various environments using continuous integration tools
  • Created Hive queries that helped market analysts spot emerging trends by comparing incremental data
  • Responsible for creating Hive tables, loading the structured data resulting from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF’s in Hive Querying
  • Environment: Hive, Airflow, AWS EMR, AWS GLUE, Maven, Impala, Spark, Yarn, GitHub, Tableau, Unix, Cloudera, GCP, AWS
  • Redshift, Snowflakes, Sqoop, HDFS, Scala, Hive, HDFS, Spark, Spark-SQL, python, Informatica Big Data Edition, Power Center, Beeline.

Big Data/ Hadoop Developer

Charles Schwab
TX
01.2020 - 09.2021
  • Responsible for building scalable distributed data solutions using Hadoop
  • Developed and implemented automation scripts for Azure services using PowerShell, Python and Bash.
  • Configured, monitored and maintained Azure Storage Accounts, Virtual Networks, App Services, Web Apps and VMs.
  • Managed Azure IaaS and PaaS resources such as SQL Database, Service Bus Queues, Event Hubs and Automation Accounts.
  • Work in a fast-paced agile development environment to quickly analyze, develop and test potential use cases for the business
  • Ingest data from Main-Frame to Hadoop landing zone using power center to handle VSAM files and developed spark scripts to cleanse data and apply business required transformations before processing through Data Warehouse
  • Design and develop ETL integration patterns using Python on Spark
  • Maintaining and Designing Data governance and security for data platforms on AWS Cloud
  • Using AWS Data Migration Service Created the source endpoint and destination endpoint for Aurora RDS Migration
  • Created the task to transfer data from source to destination and used the AWS schema conversion tool to create the schema in the target database
  • Develop framework for converting existing Informatica mappings and to PySpark(Python and Spark) Jobs
  • Developed ETL mappings, reusable transformations and mapplets using Informatica Developer tool to Extract Data from Ingestion target tables/HDFS files and applied Technical and Business Transformations to load data into Enterprise Data warehouse Systems
  • Developed python scripts using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into HDFS
  • Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning, loading the data into Spark RDD and do in memory data computation to generate the desired output
  • Enhancing the decision making process with RDBMS
  • Worked with application teams for Hadoop updates, patches, version upgrades as required
  • Worked as a lead to test Disaster recovery process and developed python script to run for every 3 months to validate production vs DR cluster data
  • Environment:POWER BI, Hive, Maven, Impala, Spark, Yarn, GitHub, Tableau, Unix, Cloudera, AWS, GCP, Redshift, Sqoop, HDFS, Scala, Hive, HDFS, Spark, Spark-SQL, python, Informatica Big Data Edition, Power Center, Beeline.

Power BI Developer/ Hadoop

Walgreens
WA
11.2018 - 12.2019
  • Responsible for building-scalable distribution data solution using POWER BI
  • Developed and implemented data warehouses to store company's business intelligence.
  • Provided technical expertise in developing, deploying and optimizing Business Intelligence solutions.
  • Created complex SQL queries for extracting data from multiple sources.
  • Conducted unit testing of BI applications to identify any bugs or errors before production deployment.
  • Designed high availability solutions by leveraging Load Balancers, Availability Sets and Traffic Manager profiles in Azure environment.
  • Integrated third-party applications with Microsoft Azure using APIs and RESTful web services.
  • Involved in writing spark with Scala applications for ETL and data analysis
  • Implemented AWS Step Functions to automate and orchestrate the Amazon Sage Maker-related tasks such as publishing data to S3, training ML model, and deploying it for prediction
  • Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon Sage Maker
  • Loading data from different relational databases to HDFS using Sqoop
  • Experience in working with SQL, HiveQl, SparkSQL and shell scripts, views, indexes, stored procedures, and other components of database applications
  • Worked on the performance tuning of spark data frames for aggregation using dynamic partition, creating the temp views needed
  • Create external Hive tables with proper static and dynamic partitions and working on them using HiveQL
  • Hands on experience in creating views and table sampling and implemented numerous HIVE queries
  • Using spark Transformations and Actions performed data cleansing on the input data
  • Involved in performing the analytics and visualization for the data from the logs for anomaly detection and the probability of future occurrences using regression models
  • Collected the log data from Web Servers and integrated into HDFS using Flume
  • Written HiveQL queries for integrating different tables for create views to produce result set
  • Created Oozie workflows to upload the data to HDFS and run HiveQL analysis
  • Experienced in developing applications in Hadoop, Impala, Hive, Sqoop, Oozie, Java MapReduce, SparkSQL, HDFS and Pig
  • Involved in migrating HiveQL into Impala to minimize query response time
  • Developed complex queries using HIVE and Impala
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs
  • Migrated an existing on-premises application to AWS
  • Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
  • Environment:POWER BI Hive, Flume, Maven, Impala, AWS, GCP, Yarn, GitHub, MRunit, Sqoop, Hbase, Hadoop, HDFS, Spark, Java, Scala, AWS(EC2,S3 and RedShift.

Power BI Developer/ Hadoop

Streamline Health
OH
01.2018 - 10.2018
  • Configured different topologies for spark cluster and deployed them on regular basis
  • Worked on Spark SQL, Reading/Writing data from JSON file, text file, parquet file, Schema RDD
  • Loading Data into HBase using Bulk Load and Non-bulk load
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop
  • Developed MapReduce programs to parse and filter the raw data and partitioned tables
  • Created Hive queries that helped market analysts spot emerging trends by comparing incremental data
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF in Hive Querying
  • Experience in processing large volume of data and skills in parallel execution of process using Talend
  • Worked on Spark SQL, SQL-RDBMS and Data frames for faster execution of Hive queries using Spark Sql Context
  • Performed analysis on implementing Spark using Scala and wrote spark sample programs using PySpark
  • Deployed and built the applications using Maven
  • Unstructured files like XML, JSON files are processed using custom built Java API and pushed into MongoDB
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
  • Coordinated the resources by working closely with Project Managers for the release and carried deployments and builds on various environments using continuous integration tools of Power BI
  • Environment: Spark, Oozie, GitHub, Junit, Cloudera, Sqoop, HDFS, Hadoop, Hive, HDFS, Spark-SQL, Java, Scala, Talend, Hive.
  • Collaborated with IT teams in designing the physical infrastructure required for deploying the Business Intelligence solutions.
  • Utilized BI tools to design and deploy customer interfaces.

Java Developer

Aradhya Tech pvt ltd
India
05.2015 - 06.2016
  • Performed analysis for the client requirements based on the developed detailed design documents
  • Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models using Microsoft Visio
  • Developed STRUTS forms and actions for validation of user request data and application functionality
  • Developed JSP with STRUTS custom tags and implemented JavaScript validation of data
  • Perform systems management and integration functions using MULESOFT, improve existing computer systems, and review computer system capabilities, workflow, and schedule limitations
  • Developed programs for accessing the database using JDBC thin driver to execute queries, Prepared statements, Stored Procedures and to manipulate the data in the database
  • Developed the application using J2EE architecture
  • Involved in developing JSP forms
  • Designed and developed web pages using HTML and JSP
  • Involved in developing business tier using stateless session bean
  • Used JavaScript for the web page validation and Struts Valuator for server-side validation
  • Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBM DB2
  • Developed Message Driven Beans for asynchronous processing of alerts
  • Used JDBC for database connectivity with MySQL Server
  • Used CVS for version control
  • Used Clear case for source code control and JUNIT for unit testing
  • Involved in peer code reviews and performed integration testing of the modules.

Education

MBA: Data Analytics -

University of New Haven

Bachelor of Science: Applied Economics -

Maharaja Sayajirao University Of Baroda

Bachelor of Science: Computer Science And Programming -

Maharaja Sayajirao University Of Baroda

Skills

  • SQL, PL/SQL, TSQL, Python(matplotlib, Seaborn, NumPy, SciPy, SciKit-Learn), Power BI, R-Studio, Alteryx, SSRS, HTML, XML, Nodejs, UNIX, JAVASCRIPT
  • Oracle 11g, 10g, 9i, SQL Server 2012/2008R2, Hadoop, HIVE, MongoDB, and MySQL, Netezza, SAS, Hadoop, U-SQL, Snowflake
  • SSIS, BI Development Studio, Power BI, Visual Studio 2012, Performance Monitor, Pivot, Spark, Scala, Kafka, DevOps
  • Linear Regression, Logistic Regression, LDA, PCA (Principal Component Analysis), K-Means, Clustering, K-Nearest Neighbors (KNN), Decision Tree, Ada Boosting, Gradient Boosting Trees, Neural Networks

Timeline

Big Data Developer

Bank of America Corporate
10.2021 - 12.2023

Big Data/ Hadoop Developer

Charles Schwab
01.2020 - 09.2021

Power BI Developer/ Hadoop

Walgreens
11.2018 - 12.2019

Power BI Developer/ Hadoop

Streamline Health
01.2018 - 10.2018

Java Developer

Aradhya Tech pvt ltd
05.2015 - 06.2016

MBA: Data Analytics -

University of New Haven

Bachelor of Science: Applied Economics -

Maharaja Sayajirao University Of Baroda

Bachelor of Science: Computer Science And Programming -

Maharaja Sayajirao University Of Baroda
SHIVAM BHATIA