Summary

Overview

Work History

Education

Skills

Websites

Timeline

SHIVAM BHATIA

Summary

Absolute Senior IT- Experienced, adept in designing, implementing and maintaining solutions on Big Data Ecosystem. Deep understanding of Hadoop architecture, Spark execution engine, Hive data warehousing and NO-SQL databases. Extensively worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics. Developed re-usable and configurable components as part of project requirements in Scala and Python. Good knowledge of Scala's functional style programming techniques like Anonymous Functions (Closures), Currying, Higher Order Functions and Pattern Matching. Experience using Sqoop to import and export data into HDFS from RDBMS and vice-versa. Good knowledge of No-SQL databases like Cassandra, MongoDB and HBase. Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions. Experienced in writing Map Reduce programs using Java to perform transformations on different data sets using Mapper and Reducer tasks. Good in using version control like GITHUB and SVN and hands-on building tool Maven and continuous integration like Jenkins. Strong debugging and critical thinking ability with good understanding of frameworks advancement in methodologies and strategies. Efficient Cloud Engineer with years of experience assembling cloud infrastructure. Utilizes strong managerial skills by negotiating with vendors and coordinating tasks with other IT team members. Implements best practices to create cloud functions like AWS, GCP applications and databases.

Overview

years of professional experience

Work History

Big Data Developer

Bank of America Corporate

Newark, DE

10.2021 - 12.2023

Configured different topologies for spark clusters and deployed them on a regular basis
Worked on Spark SQL, Reading/Writing data from JSON file, text file, parquet file, Schema RDD
Loading Data into HBase using Bulk Load and Non-bulk load.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop
Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs
Experience in working with SQL, HiveQl, SparkSQL and shell scripts, views, indexes, stored procedures, and other components of database applications
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
Coordinated the resources by working closely with Project Managers for the release and carried deployments and builds on various environments using continuous integration tools
Created Hive queries that helped market analysts spot emerging trends by comparing incremental data
Responsible for creating Hive tables, loading the structured data resulting from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns
Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF’s in Hive Querying
Environment: Hive, Airflow, AWS EMR, AWS GLUE, Maven, Impala, Spark, Yarn, GitHub, Tableau, Unix, Cloudera, GCP, AWS
Redshift, Snowflakes, Sqoop, HDFS, Scala, Hive, HDFS, Spark, Spark-SQL, python, Informatica Big Data Edition, Power Center, Beeline.

Big Data/ Hadoop Developer

Charles Schwab

01.2020 - 09.2021

Responsible for building scalable distributed data solutions using Hadoop
Developed and implemented automation scripts for Azure services using PowerShell, Python and Bash.
Configured, monitored and maintained Azure Storage Accounts, Virtual Networks, App Services, Web Apps and VMs.
Managed Azure IaaS and PaaS resources such as SQL Database, Service Bus Queues, Event Hubs and Automation Accounts.
Work in a fast-paced agile development environment to quickly analyze, develop and test potential use cases for the business
Ingest data from Main-Frame to Hadoop landing zone using power center to handle VSAM files and developed spark scripts to cleanse data and apply business required transformations before processing through Data Warehouse
Design and develop ETL integration patterns using Python on Spark
Maintaining and Designing Data governance and security for data platforms on AWS Cloud
Using AWS Data Migration Service Created the source endpoint and destination endpoint for Aurora RDS Migration
Created the task to transfer data from source to destination and used the AWS schema conversion tool to create the schema in the target database
Develop framework for converting existing Informatica mappings and to PySpark(Python and Spark) Jobs
Developed ETL mappings, reusable transformations and mapplets using Informatica Developer tool to Extract Data from Ingestion target tables/HDFS files and applied Technical and Business Transformations to load data into Enterprise Data warehouse Systems
Developed python scripts using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into HDFS
Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning, loading the data into Spark RDD and do in memory data computation to generate the desired output
Enhancing the decision making process with RDBMS
Worked with application teams for Hadoop updates, patches, version upgrades as required
Worked as a lead to test Disaster recovery process and developed python script to run for every 3 months to validate production vs DR cluster data
Environment:POWER BI, Hive, Maven, Impala, Spark, Yarn, GitHub, Tableau, Unix, Cloudera, AWS, GCP, Redshift, Sqoop, HDFS, Scala, Hive, HDFS, Spark, Spark-SQL, python, Informatica Big Data Edition, Power Center, Beeline.

Power BI Developer/ Hadoop

Walgreens

11.2018 - 12.2019

Responsible for building-scalable distribution data solution using POWER BI
Developed and implemented data warehouses to store company's business intelligence.
Provided technical expertise in developing, deploying and optimizing Business Intelligence solutions.
Created complex SQL queries for extracting data from multiple sources.
Conducted unit testing of BI applications to identify any bugs or errors before production deployment.
Designed high availability solutions by leveraging Load Balancers, Availability Sets and Traffic Manager profiles in Azure environment.
Integrated third-party applications with Microsoft Azure using APIs and RESTful web services.
Involved in writing spark with Scala applications for ETL and data analysis
Implemented AWS Step Functions to automate and orchestrate the Amazon Sage Maker-related tasks such as publishing data to S3, training ML model, and deploying it for prediction
Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon Sage Maker
Loading data from different relational databases to HDFS using Sqoop
Experience in working with SQL, HiveQl, SparkSQL and shell scripts, views, indexes, stored procedures, and other components of database applications
Worked on the performance tuning of spark data frames for aggregation using dynamic partition, creating the temp views needed
Create external Hive tables with proper static and dynamic partitions and working on them using HiveQL
Hands on experience in creating views and table sampling and implemented numerous HIVE queries
Using spark Transformations and Actions performed data cleansing on the input data
Involved in performing the analytics and visualization for the data from the logs for anomaly detection and the probability of future occurrences using regression models
Collected the log data from Web Servers and integrated into HDFS using Flume
Written HiveQL queries for integrating different tables for create views to produce result set
Created Oozie workflows to upload the data to HDFS and run HiveQL analysis
Experienced in developing applications in Hadoop, Impala, Hive, Sqoop, Oozie, Java MapReduce, SparkSQL, HDFS and Pig
Involved in migrating HiveQL into Impala to minimize query response time
Developed complex queries using HIVE and Impala
Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs
Migrated an existing on-premises application to AWS
Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
Environment:POWER BI Hive, Flume, Maven, Impala, AWS, GCP, Yarn, GitHub, MRunit, Sqoop, Hbase, Hadoop, HDFS, Spark, Java, Scala, AWS(EC2,S3 and RedShift.

Power BI Developer/ Hadoop

Streamline Health

01.2018 - 10.2018

Configured different topologies for spark cluster and deployed them on regular basis
Worked on Spark SQL, Reading/Writing data from JSON file, text file, parquet file, Schema RDD
Loading Data into HBase using Bulk Load and Non-bulk load
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop
Developed MapReduce programs to parse and filter the raw data and partitioned tables
Created Hive queries that helped market analysts spot emerging trends by comparing incremental data
Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns
Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF in Hive Querying
Experience in processing large volume of data and skills in parallel execution of process using Talend
Worked on Spark SQL, SQL-RDBMS and Data frames for faster execution of Hive queries using Spark Sql Context
Performed analysis on implementing Spark using Scala and wrote spark sample programs using PySpark
Deployed and built the applications using Maven
Unstructured files like XML, JSON files are processed using custom built Java API and pushed into MongoDB
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala
Coordinated the resources by working closely with Project Managers for the release and carried deployments and builds on various environments using continuous integration tools of Power BI
Environment: Spark, Oozie, GitHub, Junit, Cloudera, Sqoop, HDFS, Hadoop, Hive, HDFS, Spark-SQL, Java, Scala, Talend, Hive.
Collaborated with IT teams in designing the physical infrastructure required for deploying the Business Intelligence solutions.
Utilized BI tools to design and deploy customer interfaces.

Java Developer

Aradhya Tech pvt ltd

India

05.2015 - 06.2016

Performed analysis for the client requirements based on the developed detailed design documents
Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models using Microsoft Visio
Developed STRUTS forms and actions for validation of user request data and application functionality
Developed JSP with STRUTS custom tags and implemented JavaScript validation of data
Perform systems management and integration functions using MULESOFT, improve existing computer systems, and review computer system capabilities, workflow, and schedule limitations
Developed programs for accessing the database using JDBC thin driver to execute queries, Prepared statements, Stored Procedures and to manipulate the data in the database
Developed the application using J2EE architecture
Involved in developing JSP forms
Designed and developed web pages using HTML and JSP
Involved in developing business tier using stateless session bean
Used JavaScript for the web page validation and Struts Valuator for server-side validation
Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBM DB2
Developed Message Driven Beans for asynchronous processing of alerts
Used JDBC for database connectivity with MySQL Server
Used CVS for version control
Used Clear case for source code control and JUNIT for unit testing
Involved in peer code reviews and performed integration testing of the modules.

Education

MBA: Data Analytics -

University of New Haven

Bachelor of Science: Applied Economics -

Maharaja Sayajirao University Of Baroda

Bachelor of Science: Computer Science And Programming -

Maharaja Sayajirao University Of Baroda

Skills

SQL, PL/SQL, TSQL, Python(matplotlib, Seaborn, NumPy, SciPy, SciKit-Learn), Power BI, R-Studio, Alteryx, SSRS, HTML, XML, Nodejs, UNIX, JAVASCRIPT
Oracle 11g, 10g, 9i, SQL Server 2012/2008R2, Hadoop, HIVE, MongoDB, and MySQL, Netezza, SAS, Hadoop, U-SQL, Snowflake

SSIS, BI Development Studio, Power BI, Visual Studio 2012, Performance Monitor, Pivot, Spark, Scala, Kafka, DevOps
Linear Regression, Logistic Regression, LDA, PCA (Principal Component Analysis), K-Means, Clustering, K-Nearest Neighbors (KNN), Decision Tree, Ada Boosting, Gradient Boosting Trees, Neural Networks

Websites

www.linkedin.com/in/shivambhatia1707

Timeline

Big Data Developer

Bank of America Corporate

10.2021 - 12.2023

Big Data/ Hadoop Developer

Charles Schwab

01.2020 - 09.2021

Power BI Developer/ Hadoop

Walgreens

11.2018 - 12.2019

Power BI Developer/ Hadoop

Streamline Health

01.2018 - 10.2018

Java Developer

Aradhya Tech pvt ltd

05.2015 - 06.2016

MBA: Data Analytics -

University of New Haven

Bachelor of Science: Applied Economics -

Maharaja Sayajirao University Of Baroda

Bachelor of Science: Computer Science And Programming -

Maharaja Sayajirao University Of Baroda

SHIVAM BHATIA

Summary

Overview

Work History

Big Data Developer

Big Data/ Hadoop Developer

Power BI Developer/ Hadoop

Power BI Developer/ Hadoop

Java Developer

Education

MBA: Data Analytics -

Bachelor of Science: Applied Economics -

Bachelor of Science: Computer Science And Programming -

Skills

Websites

Timeline

Big Data Developer

Big Data/ Hadoop Developer

Power BI Developer/ Hadoop

Power BI Developer/ Hadoop

Java Developer

MBA: Data Analytics -

Bachelor of Science: Applied Economics -

Bachelor of Science: Computer Science And Programming -

Similar Profiles

Julio AvalosJulio Avalos

Pramod Kumar VeeraboinaPramod Kumar Veeraboina

Demetra CrawfordDemetra Crawford

SUNIL MEHTASUNIL MEHTA

Praveen Kumar GodatiPraveen Kumar Godati