Summary
Overview
Work History
Education
Skills
Timeline
Generic

Arun Kumar Rajasekaran

Secaucus,NJ

Summary

  • 10+ years of proven ability to identify cutting-edge solutions that meet client requirements through the assimilation of apparently disparate platforms.
  • 7+ years of experience in BigData platforms.
  • 1.5 year of Experience in the US as a Senior BigData Engineer
  • Extensive experience in Big Data solutions using Hadoop eco system, Azure Databricks, Snowflake and Hive
  • Versatile in providing solutions for cloud based IAAS, CAAS systems using AWS and Docker
  • Expertise in components of Azure Databricks and Hadoop Ecosystem - DataBricks, Data Factory, Deltalake, Hive, Pig, Sqoop, Impala, Flume, Zookeeper, Oozie, Airflow, and Apache Spark.
  • Experienced in Creating, Debugging, Scheduling and Monitoring jobs using Azure Data Factory, Airflow and Oozie.
  • Having sound experience in Big Data experience in Ingestion, Aggregation, suppression, storage, querying, processing and analysis of big data.
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce programming paradigm.
  • Very good experience in the Application Development and Maintenance of SDLC projects using various technologies such as NodeJs, React, JavaScript, Scala, Php, Data Structures, UNIX shell scripting etc.
  • Strong understanding of the entire AWS Product and Service suite primarily Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, Cloud Watch, SNS, SES, SQS and other services of products and their applicable use cases, best practices and implementation, and support considerations.
  • Experience in importing and exporting data from different RDBMS like MySQL, Oracle and SQL Server into HDFS and Hive using Sqoop.
  • Expertise in designing clustered tables on Snowflake database to improve query performance for consumers.
  • In-Depth understanding of Snowflake as SaaS cloud technology.
  • Good understanding of NoSQL databases and hands on experience in writing applications on NoSQL databases like HBase.
  • Experience in developing custom MapReduce programs using Apache Hadoop to perform Data Transformation and analysis as per requirement.
  • Experience on Creating Internal and External tables and implementation of performance improvement techniques using partitioning tables, bucketing tables in Hive.
  • Experience in analyzing data using HBase and custom Map Reduce programs in Java.
  • Experience in creating PIG and HIVE UDFs using java in order to analyze data sets.
  • Worked on reading multiple data formats on HDFS using Spark API's.

Overview

10
10
years of professional experience

Work History

Lead Big Data Engineer (Big Data)

NPD Group
New York, NY
03.2022 - Current

NPD, LPFG, (From March 2022 onwards) Team Size: 12

NPD is a leading client in the retail analytics industry based in the USA. NPDI offers data, industry expertise, and prescriptive analytics to help their clients to grow businesses in the changing world. More than 2,000 of the world’s leading brands and retailers are NPDI’s customers. They help their clients to measure, predict, and improve performance across all channels. They do performance benchmarking, guide strategic decision-making, and improve pricing, product management, new product innovation, customer segmentation, assortment, data-based insights, and sales forecasts.

NPD LPFG:

LPFG is a cloud based platform to convert some of onPrem Hadoop pipelines into Cloud using Azure cloud services for better performance. On the HDFS, the oracle dictionary definitions are applied with POS data coming from various retailers and the data moved to Azure cloud to process the compute techniques such as data transformation, augmentation, enrichment, filtering, grouping, aggregating, and running the algorithms against the data. Pipelines created in Azure DataFactory and final data stored in Azure DeltaLake and delivered as binary files in the DataMart to perform the analytics. The highly available cloud platform is built with more than 100 clusters and processes 5+ PB data for various retailers.

Responsibilities:

  • Managing offshore team deliverables and coordinating with with the team.
  • Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
  • Responsible for developing prototypes of selected solutions and implementing complex big data projects with focus on collecting, parsing, managing, analyzing and visualizing large sets of data using multiple platforms.
  • Understand how to apply new technologies to solve big data problems and to develop innovative big data solutions.
  • Developed various data loading strategies and performed various transformations for analyzing datasets by using Azure Databricks.
  • Responsible for processing the raw geo files from various retailers using Azure DataBricks.
  • Responsible for handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Developed Scala/Spark jobs for data transformation and aggregation using databricks notebooks
  • Built Data factory pipelines to run end to end jobs for data processing and deliver the binary files to Sybase DataMart
  • Worked on data migration from onPremises hadoop cluster to Azure cloud storage using AzCopy
  • Managing metadata, pipelines performance, job rules, job statuses on Azure cloud
  • Responsible for optimizing the spark queries and improving performance across the platform
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Technologies: Azure DataBricks, DeltaLake, Azure Data factory, Spark/Scala, Oracle, Python, pandas, SVN, Putty, Oracle SQL Developer, DBeaver, MySQL Workbench, JCA, Visual Studio code, Global Protect, Termius, Azure DevOps, Git

Lead Big Data Engineer

NPD - Point Of Sale
CHENNAI, India
07.2020 - 03.2022

NPD POSP has billions of POS data that comes from various retailers across the countries. Using the Cloudera Hadoop system, perform ETL (cleansing, filtering, etc.) and apply various compute techniques including imputation, aggregation, projection, and suppression. Publish the final delivery data to a DataMart built in Sybase database to perform the retail analytics.

  • Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS , processing and analyzing the data in HDFS.
  • Hands on experience in designing, developing, and maintaining software solutions in Hadoop cluster.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Spark Yarn.
  • Worked on POC's with Apache Spark using Scala to implement spark in project.
  • Build Scalable distributed data solutions using Hadoop Cloudera Distribution.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Developed complex MapReduce streaming jobs using Java language that are implemented Using Hive and Pig and using MapReduce Programs using Java to perform various ETL tasks.
  • Developing and running Map-Reduce Jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in MapReduce way.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
  • Involved in using SQOOP for importing and exporting data between RDBMS and HDFS.
  • Involved in developing Hive DDLS to create, alter and drop Hive tables.
  • Involved in loading data from Linux file system to HDFS.
  • Created PIG scripts to load, transform and store the data from various sources into HIVE metastore.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed highly maintainable Hadoop code and followed all best practices regarding coding.

Technologies: Hadoop, MapReduce, Java, Scala, Spark, Hive, Pig, Sqoop, Python, Kafka, Cloudera, Scala IDE(IntelliJ), Maven, Ant, HDFS.

Lead Software Engineer

GnGn: E-Learning Platform
CHENNAI, India
08.2016 - 07.2020

GnGn, headquartered in Tokyo City, Japan, was founded in 2013 and it is a private Platform as a service(PaaS) company. The company provides an online tutoring PaaS. According to GGE, their customers include millions of university students and corporate employees in Japan and Philippines including 3 branches in Japan & Philippines. GNGN platform has various applications:

SaaS Tutoring App: The tutoring platform is hosted in AWS as a node.js/express.js app with react.js front-end. MySQL is the database. It has an online webRTC application that allows students to interact with tutors. Most sessions are one-on-one; however some are in group settings online. The tutors are all over the world and most students are from Japan, South Korea, China, Hong Kong and the Philippines. The app has Admin, Tutor logins as well as Management login to track the real-time status.

Tutor Analytics: All sessions have video, audio, online chat data that are pumped from MySQL to an OnPrem HDFS cluster of 22 nodes with 800 TB of data. Data pipeline is built on HBase-based python code that aggregates the data, correlates, imputes and cleans and summary is pumped into an OLAP MySQL database on which FluentD and later Kibana is used to build dashboards and analytics about tutor’s performance to management. Elastic Stack is used for indexing the output of the data-pipeline.

Tablet Apps for Tuitions: There are android and iOS applications built using react native that helps students and tutors work in remote settings.

Responsibilities:

GnGn Analytics:

  • Developed Data pipeline using Apache Spark and python on HDFS and processed users and tutors data
  • Transported operational data from Tutor SaaS MySQL to Cluster using Spark and Sqoop
  • Built HBased-python jobs for data pipeline for cleansing, aggregation, correlation
  • Developed few analytics dashboards in FluentD and Kibana

GnGn SaaS:

  • Assisted with developing the app in mysql/node.js/express.js/react.js stack
  • Developed the webRTC client for online chat, video, audio session
  • Development of all APIs using graphQL with node.js with Typescript
  • Design and development of the front-end using react.js
  • Delivery And Team Management
  • Client Liaison
  • AWS: EC2, RDS, VPC, , ECS, S3

Technologies: HDP Cluster, HBase, Sqoop, Flume, python, Github, node.js, react.js, fluentD, AWS, EC2, Load Balancers, CloudFront, Swagger, Memcached, Munin, Redmine, Elastic Stack, Kibana

Senior Software Engineer

GNGN: Gge.co.jp
Chennai, Chennai
10.2014 - 07.2016

GNGN: gge.co.jp, (From January 2013 to July 2014, 1.5 years) Team Size: 8

GGE, headquartered in Tokyo City, Japan, was founded in 2013 and it is a private Platform as a service(PaaS) company. The company provides an online tutoring PaaS. According to GGE, their customers include thousands of university students and corporate employees in Japan and Philippines including 3 branches in Japan & Philippines. It is developed for tutoring spoken english courses through skype video call. It has many experienced spoken English teachers and students registered on the site. Teachers can update their schedule and Students can book their classes in advance using this site. The platform is built on node.js and front end frameworks like jQuery, Javascript, underscore.js etc. Platform uses FluentD and munin for log management and memcached for caching. Responsible for all modules in back-end as well as front-end including server maintenance. We are developing many new modules for increasing App’s performance. The platform can be embedded easily with any CMS platforms like wordpress. We have included about 20 subsites on Wordpress for corporates and universities by white-labeling the platform for them. Each subsite has separate functions and features and developed for separate use of educational systems like schools, colleges and companies. We have also built wordpress plugins for each of these sub-sites using our GGE Platform’s restful API to make it customized for their needs.

Responsibilities:

  • Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using the AGILE Software development methodology.
  • Full Stack web development using Java, Php, Python, Javascript
  • Development of restful api on Java/spring boot
  • AWS IaaS Management: EC2, IAM, VPC, RDS, S3, CloudFront, Load Balancer
  • Frontend development using Javascript frameworks.
  • MySql Database management
  • Reviewed project specifications and designed technology solutions that met or exceeded performance expectations.
  • Coordinated with other engineers to evaluate and improve software and hardware interfaces.
  • Worked with software development and testing team members to design and develop robust solutions to meet client requirements for functionality, scalability, and performance.
  • Analyzed proposed technical solutions based on customer requirements.


Software Engineer

Safetown
Chennai, India
09.2013 - 10.2014

Safetown is a Platform as a Service(PaaS) for consumer safety. The platform is a powerful, easy-to-use suite of API that can be easily integrated with any web-based and mobile apps that empower you to share information with local law enforcement, fire, emergency services, and other citizens to make your community a better, safer place to live. The platform allows citizens to see the 911 events in real-time, able to report crime through mobile apps and cloud apps. The platform is hosted in an auto-scalable environment in AWS using Amazon Elastic Beanstalk. The app has connectors built in C with the CAD (Computer Aided Dispatch) software that manage the Emergency calls. The back-end is built on Jersey-Framework with MongoDB as the database store and the front-end is built on the Wordpress site. The project also has iOS and Android app to see the incident from their Smartphones and get updates quicker. We have developed Wordpress custom plugins. Established a network of sites by using the multisite feature so there is a unique SafeTown site for more than 100 counties in the US. Worked with various technologies in this project that includes Java, Jersey Framework, Wordpress plugins in php, Jquery based front-end development and supporting the mobile team for API. We have Developed Wordpress plugins in php for various modules such as Community Alerts, Jail Management, Household Profile using the Platform’s API written in Java on Jersey Framework.

Responsibilities:

  • Development of restful API using NodeJs framework
  • Frontend included underscore and bootstrap
  • Development of Custom plugins in Wordpress in php and JS
  • AWS IaaS Management: EC2, IAM, VPC, RDS, S3, CloudFront, Load Balancer

Technologies:

Github, AWS Cloud, Elastic Beanstalk, NodeJS, restful API, MongoDB, JavaScript, jQuery, underscore.js, Java, PHP, HTML, CSS, Git

Education

Bachelor of Science - Computer Information Systems

Karnataka Open University
India
05.2015

Diploma - Electronics And Communication

Central Polytechnic College
India
04.2006

Skills

  • Programming Languages: SQL, Java, Python, Scala, Php, Pig Latin, HiveQL and Unix shell scripting
  • Big Data Technologies: Azure Databricks, Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Kafka, Oozie, AirFlow, Spark SQL, PySpark, Flume, Zookeeper, Spark, Data factory, Cloudera, Delta Lake
  • Cloud Technologies: AWS, EC2, S3, VPC, Lambda, Redshift, EMR, SnowFlake, Databricks
  • Databases: Oracle, MySQL, Postgre Sql, Familiar with Hbase
  • Scripting & Query Languages :UNIX Shell scripting, SQL and PL/SQL
  • Web Technologies :React, ES8, Typescript, CSS, HTML, JavaScript, AJAX, JDBC
  • Hadoop Paradigms :MapReduce, YARN, In-memory computing, High Availability, Real-time Streaming
  • Operating Systems :Windows, UNIX, Linux distributions (Centos, Ubuntu), Mac OS

Timeline

Lead Big Data Engineer (Big Data)

NPD Group
03.2022 - Current

Lead Big Data Engineer

NPD - Point Of Sale
07.2020 - 03.2022

Lead Software Engineer

GnGn: E-Learning Platform
08.2016 - 07.2020

Senior Software Engineer

GNGN: Gge.co.jp
10.2014 - 07.2016

Software Engineer

Safetown
09.2013 - 10.2014

Bachelor of Science - Computer Information Systems

Karnataka Open University

Diploma - Electronics And Communication

Central Polytechnic College
Arun Kumar Rajasekaran