Summary
Overview
Work History
Education
Skills
Websites
Certification
Accomplishments
Timeline
Generic

Binish Abraham

Bellerose,NY

Summary

Focused and detail-oriented Big Data Engineering professional with experience in distributed processing and various cloud technologies. Experience in various ML algorithms and statistical models. 13 Years of experience in vast technology space and consulting roles for clients from different domains. Practical exposure and strong knowledge in big data management using HADOOP, HDFS, HIVE, SPARK, HBASE, SQOOP, and IMPALA. Excellent understanding of Hadoop Distributed file system, Spark processing engine, data modeling, architecture, and design principles. Experience in architecting, designing, and building distributed data pipelines in cloud and on-prem. Experience in deploying Rest API endpoints using java, spring boot and swagger. Development experience in Java, Scala, Python, and R. Experience in Cassandra, MongoDB, Riak, Gluster FS, CI/CD, Rest API, Jenkins, Shell scripting and PL/SQL. Implemented search solutions using Elasticsearch and SOLR. Excellent exposure to Linux OS internals, servers and RHCE certified. Experience in Docker Container technology and its management using Kubernetes. Experience in AWS cloud and GCP cloud.

Overview

17
17
years of professional experience
1
1
Certification

Work History

Big Data Developer

Bank Of America
New York, NY
06.2022 - 04.2024
  • Designed and developed hive and Impala queries, Python scripts, Scala code changes, Shell scripts for ETL purpose based on end user requirements
  • Designed, developed and modified SQL queries based on HQL (Hive Query Language) for business needs which executes in spark environment
  • Architected and implemented a robust Hadoop infrastructure, deploying Hadoop Distributed File System (HDFS) and setting up clusters for efficient data storage and processing
  • Implemented robust data quality controls, including validation checks and error handling mechanisms, to ensure data accuracy and consistency throughout the platform
  • Worked on providing data for models by running multiple data pipelines which extract data from multiple data sources and export the required data and combine them
  • Worked on data transformations and loading using PySpark and Python for ML projects
  • Analyzed Spark and Hadoop logs using UI and SSH nodes to find issues and resolve them
  • Maintained proper code using Bitbucket and participated in monthly release activities
  • Hadoop, Spark, Hive, Impala, Python, Toad, HUE, HiveQL, Autosys, Bitbucket

Big Data Engineer/Architect

Cognizant Technology Solutions Corp
03.2007 - 10.2018
  • As a big data expert, architected, developed and implemented various big data solutions
  • Involved in setting up the clusters like Kubernetes cluster, Hadoop cluster, Elastic Search cluster and developed and deployed containerized solution using docker
  • Implemented solutions at different client data centers from various domains
  • Made Docker and Kubernetes in Linux environment to collect logs from various sources
  • Analyzed various machine learning libraries and cyber-attacking techniques for implementing a 'self-healing' system
  • Design and develop Rest API endpoints to expose the services using Java and Spring Boot
  • USA/INDIA
  • Used Amazon Web Services EC2, EMR, S3, RDS, Datastax Cassandra, Swagger, Jenkins, Sonatype Nexus

Big Data Engineer

Turner Broadcasting System
New York City, NY
09.2017 - 03.2018
  • Designed and developed hive and presto queries to run in Amazon cluster for ETL process
  • Optimized the queries which improved the overall execution by 15%
  • Identified various data sources and work with international team to pull data from DFP, Free Wheel, STAQ etc
  • Handled Amazon Web Services, S3, Hadoop, Spark, Hive, Looker, Qubole, Presto, Python
  • Contract through Cognizant

Architect

Comcast Corporation
Mt Laurel, NJ
02.2017 - 08.2017
  • Worked as an architect to design a big data solution which collects data coming from various sources at high volume, and co developed the solution
  • Generated meaningful representation of data available so that data scientists can use it
  • Zeppelin reports are designed that can be used by decision makers
  • Handled Hive, Spark, Scala, Kafka, Flume, NiFi, Shell Scripts, Zeppelin, HDP 2.5, Hue 3.11
  • Developed data pipeline using PySpark for data transformation and loading
  • Contract through Cognizant

Product Specialist and Onsite Coordinator

BJ's Wholesale
10.2016 - 01.2017
  • Implemented Cognizant Activity Intelligence at client location
  • Coordinated with offshore team for customization done for BJs based special requirements
  • Handled Kafka, Flume, Spark, PySpark, Spring 4.3, Spring Boot, Elasticsearch, Shell Scripts, Orient DB, MariaDB
  • Contract through Cognizant

Big Data Consultant

Comcast Corporation
Mt. Laurel, NJ
06.2015 - 09.2016
  • Successfully worked with client architects for designing and implementing a Google like search solution for customer data
  • Improved the search efficiency by 25% using Elastic Search cluster
  • Onsite-Offshore coordination for delivering full solution
  • Contract through Cognizant
  • Hadoop, Cassandra, Elasticsearch, Java, RSA, Shell Scripts, Oracle

Big data consultant

Siemens Healthcare
Malvern, PA
10.2014 - 05.2015
  • Increased patient allocation efficiency by 30% by implementing Big Data technologies in allocation system
  • As big data consultant worked with client architects to accommodate enhancements for system and maintain application at client site
  • Contract through Cognizant
  • Technologies used: Spring, Hadoop, HBase, Oozie, MongoDB, Sqoop, Solr, Maven, Shell scripting, SQL Server

Onsite Lead

New York Times
New York City, NY
06.2014 - 09.2014
  • Improved the efficiency of the system by rewriting client code using spring
  • Interacted with upstream and downstream system architects for proper implementation
  • Contract through Cognizant
  • J2EE, Spring Integration, Maven, Shell scripting, Oracle 10g, SQL Developer, SoapUI

Big Data Developer

ITHAKA
Ann Arbor, MI
04.2013 - 05.2014
  • As Hadoop and Cassandra consultant at onsite, worked on developing Maple and Cedar applications using AWS stack
  • Develop Rest API endpoints to expose the services related to Cedar and Maple
  • Contract through Cognizant
  • Amazon Web Services – EC2, EMR, S3, RDS, Datastax Cassandra, Spring 3.2, Zookeeper, Swagger, Solr, Jenkins, Sonatype Nexus, Maven, Git, Elasticsearch, Rest API

Offshore Team Lead

Mattel
Chennai, India
07.2012 - 03.2013
  • Successfully converted a java application into Map-Reduce framework to execute in a Hadoop cluster which improved the efficiency by 100%
  • Worked on design and development of application using Hadoop stack, delivery of solution within tight timeframe
  • Handled J2EE, Hadoop, Cassandra, MongoDB, HBase, Flume, Postgres
  • Executed Prof Of concepts based on Big Data technologies
  • Contract through Cognizant

Hadoop and Cassandra Specialist

BNY Mellon
New York City, NY
01.2012 - 06.2012
  • As Hadoop and Cassandra specialists involved in the design and development of solutions using Java and implemented it at on-premise data center
  • Contract through Cognizant
  • Used Java, Hadoop, Bash scripting, Cassandra and Memcached

Big Data Developer

Cognizant Technology Solutions Corp
03.2007 - 12.2011
  • Chennai, India

Education

Master of Science - Data Science

St. John's University
Queens, NY
05.2021

Bachelor's Degree - Computer Science and Engineering

Mahatma Gandhi University
Kerala, India
05.2006

Skills

  • Distributed processing frameworks
  • Hadoop and Spark
  • NoSQL databases
  • MongoDB and Cassandra
  • Data streaming with Kafka
  • Containerization with Docker
  • Scripting in Bash and Shell
  • ETL processes and tools
  • Workflow orchestration with Airflow
  • Version control with Git
  • Data storage systems (HDFS, HBase)
  • Data querying with Hive and HiveQL
  • Cloud services (AWS, Google App Engine)
  • Database management (MariaDB, Oracle)
  • Programming languages (Java, Python, Scala, R)
  • Machine learning frameworks (Keras, TensorFlow)
  • Data visualization tools (Looker, Zeppelin)
  • Data analysis platforms (Jupyter Notebook, Databricks)

Certification

  • Databricks Certified Associate Developer for Apache Spark 3.0
  • RedHat Certified Engineer

Accomplishments

  • Optimized Data Processing: Reduced data processing time by 30% using optimized Spark and Hadoop configurations.
  • Hadoop Deployment Success: Led successful deployment of Hadoop cluster, on-prem and in AWS cloud and converted applications for running in distributed clusters.
  • Enhanced ETL Accuracy: Improved accuracy of ETL scripts by 15% through advanced data validation.
  • Efficient Data Modelling: Designed data model reducing query response time by 25% using advanced indexing techniques.

Timeline

Big Data Developer

Bank Of America
06.2022 - 04.2024

Big Data Engineer

Turner Broadcasting System
09.2017 - 03.2018

Architect

Comcast Corporation
02.2017 - 08.2017

Product Specialist and Onsite Coordinator

BJ's Wholesale
10.2016 - 01.2017

Big Data Consultant

Comcast Corporation
06.2015 - 09.2016

Big data consultant

Siemens Healthcare
10.2014 - 05.2015

Onsite Lead

New York Times
06.2014 - 09.2014

Big Data Developer

ITHAKA
04.2013 - 05.2014

Offshore Team Lead

Mattel
07.2012 - 03.2013

Hadoop and Cassandra Specialist

BNY Mellon
01.2012 - 06.2012

Big Data Engineer/Architect

Cognizant Technology Solutions Corp
03.2007 - 10.2018

Big Data Developer

Cognizant Technology Solutions Corp
03.2007 - 12.2011

Master of Science - Data Science

St. John's University

Bachelor's Degree - Computer Science and Engineering

Mahatma Gandhi University
Binish Abraham