Summary
Overview
Work History
Education
Skills
Timeline
Generic

ALI SAFDAR NAIF

Dallas,Texas

Summary

8 years of IT experience with BigData Hadoop & Spark Development. Experience with Hadoop ecosystem components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Scala, Flume, Kafka, Oozie, Java and HBase. Working knowledge of Architecture of Distributed systems and Parallel processing frameworks. In-depth understanding of Spark execution model and internals of MapReduce framework. Good working experience in developing production ready Spark applications utilizing Spark-Core, Data-frames, Spark-SQL and Spark-Streaming API’s. Experience with Hadoop distributions like Cloudera (Cloudera distribution CDH4 and 5). Worked extensively in fine-tuning resources for long running Spark Applications to utilize better parallelism and executor memory for more caching. Good experience working with both batch and real-time processing using Spark frameworks. Proficient knowledge of Apache Spark and programming Scala to analyze large datasets using Spark to process real time data. Good working knowledge of developing Pig Latin Scripts and using Hive Query Language. Good working experience of performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive. Good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance. Good experience using different file formats like Avro, RCFile, ORC and Parquet formats. Good working experience in optimizing MapReduce algorithms by using Combiners and custom partitioners. Experience with NoSQL Column – Oriented Databases like HBase, Cassandra, MongoDB and it’s Integration with Hadoop cluster. Experience with scripting language like Shell, Bash Scripts. Experience with data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka. In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Secondary Name Node, MapReduce programming paradigm. Worked with Sqoop to move (import / export) data from a relational database into Hadoop. Well versed with Agile-Scrum working environment using JIRA and version control tools like GIT. Flexible, enthusiastic and project-oriented team player with excellent communication skills.

Overview

10
10
years of professional experience

Work History

Data Engineer

Principal Financial Group
04.2022 - Current
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Working with the Product Owner and other engineering team members, design and develop end-to-end Document Capture
    solutions.
    • Perform engineering tasks during each sprint, including coding, testing, debugging, and deployment.
    • Collaborate with business SMEs to define capture flows, data models, and business rules.
    • Create and maintain technical documentation including architecture, design, & testing artifacts.
    • Develop AWS serverless solution using Python & AWS services like AWS CDK/CFT, API Gateway, Lambda, Step
    Functions, S3, SNS, EventBridge, etc.
    • Mechanical predictions, analysis and visualization for multi-factor performance sets.
    • Working with UX colleagues, develop modern & responsive user interfaces where required.
    • Working with Git and related tools (e.g., GitHub, BitBucket, etc.).
    • Maintaining and monitoring the risk modeling infrastructure from a business perspective.
    • Meet CI/CD requirements for automated testing, code coverage, and quality requirements.
    • As part of an Agile team, identify and recommend continuous improvement opportunities.
    • Develop and maintain risk models as prescribed by the regulator.
    • Gain in-depth knowledge of diverse and emerging technologies, architectural concepts, and principles.
    • Provisioning & supporting Cloud applications, preferably through AWS and Azure. Enterprise IT Support (Server,
    LAN/WAN, Phones, Security), Active DevOps or Open-Source Community Member Experience with Container
    platforms (Dockers, Kubernetes, Fargate). Build/configure AWS services to meet standards, architecture patterns, and
    operational policies.
    • Build real time ingestion, calculation engines and report solutions of Risk and compliance organization.
    • Immerse in Scala/Python engineering for better data visibility and unearth viable channels for improvement.
    • Influential participant in Cloud & application security, infrastructure logging, and application monitoring solutions (AWS
    GuardDuty, KMS, New Relic, Packer).
  • Installed, configured and deployed various resources for cross team shares in JFrog Artifactory Manager.
  • Involved in setting up Jenkins master and multiple slaves for multiple teams for CI//CD purposes.
  • Worked on creating Docker containers and docker consoles for managing application lifecycles.
  • Migrated various databases to snowflake, building out ETL/ELT pipelines getting in and getting out of snowflake's data warehousing using combination of SnowSQL and Python and hands on experience using SnowPipe.

Data Engineer

California Governors Office Of Emergency Services
08.2021 - 04.2022
  • Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs-Cassandra, MongoDB.
  • Implemented Spark Storm builder topologies to perform cleansing operations before moving data into Cassandra.
  • Developed ETL workflow which pushes webserver logs to an Amazon S3 bucket.
  • Implemented Cassandra connection with the Resilient Distributed Datasets (local and cloud).
  • Importing and exporting data into HDFS and Hive.
  • Implemented ETL code to load data from multiple sources into HDFS using Pig Scripts.
  • Implemented Pig as ETL tool to do Transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on Talend ETL scripts to pull the data from TSV Files/Oracle Data Base into HDFS.
  • Worked extensively on design, development and deployment of talend jobs to extract data, filter the data and load them into Datalake.
  • Extract data from source system and transform into newer systems using Talend DI Components.
  • Worked on Storm to handle the parallelization, partitioning, and retrying on failures and developed a data pipeline using Kafka and Strom to store data into HDFS.
  • Exploring with Spark improving the performance and optimization of the Existing algorithms in Hadoop using Spark context,Spark -SQL, data frame pair RDD’s, Spark YARN.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Experienced with Jira, Bit Bucket and source control systems like Git and SVN and development tools like Jenkins and artifactory.
  • Worked on RDBMS like Oracle DB2 SQL Server and My SQL database.
  • Developed workflows to cleanse and transform raw data into useful information to load it to a Kafka Queue to be loaded into HDFS and noSQL database.
  • Responsible to do sanity testing of the system once the code is deployed in production.
  • Experienced in using IDEs like Eclipse and Intelij to modify the code in Git.
  • Involved in quality assurance of the data mapped into production.
  • Involved in code walk through, reviewing, testing and bug fixing.

Sr. BigData Engineer

DXC Technology
03.2020 - 07.2021
  • Worked on a live 65 nodes Hadoop cluster running CDH4.7.
  • Worked with highly unstructured and semi structured data of 70 TB in size (210 TB with replication factor of 3).
  • Experience in AWS cloud environment on S3 storage and EC2 instances.
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Configured Flume to capture the news from various sources for testing the classifier.
  • Experience in developing MapReduce jobs using various Input and output formats.
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs and Hive jobs.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Involved in loading data into Cassandra NoSQL Database.
  • Developed Spark applications to move data into Cassandra tables from various sources like Relational Database or Hive.
  • Worked on Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Cassandra.
  • Developed Hadoop Solutions on AWS from Developer to Admin roles utilizing the Hortonworks Hadoop Stack.
  • Managed RHL/AWS Role Based Security and Hadoop Admin Load Balancing on AWS EC2 Cluster.
  • Migration Solution from on prem to AWS using Sqoop, Pig, AWS Cloud.
  • Worked on Cassandra Data modelling, NoSQL Architecture, DSE Cassandra Database administration, Key space creation,table creation, Secondary and Solr Jenkin index creation, User creation & access administration.
  • Experience in performance tuning a Cassandra cluster to optimize writes and reads.
  • Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

BigData Engineer

EPAM
01.2019 - 02.2020
  • Developed workflows for complete end to end ETL process starting with getting data into HDFS, validating and applying business logic, storing clean data in hive external tables, exporting data from hive to RDBMS sources for reporting and escalating and data quality issues.
  • Working as onsite coordinator and providing technical assistance, troubleshooting and alternative development solutions.
  • Handled importing of data from various data sources performed transformations using Spark and loaded data into hive.
  • Involved in performance tuning of Hive (ORC table) for design, storage, and query perspectives.
  • Developing and deploying using Horton works HDP 2.3.0 in production and HDP 2.6.0 in the development environment.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems and vice- versa.
  • Worked in developing Pig scripts to create the relationship between the data present in the Hadoop cluster.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of spark using Python and Scala.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and Load data from different sources like Azure SQL, Blob storage, Azure SQL Data Warehouse.
  • Implemented large Lambda Architecture using Azure Data platform capabilities such as Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL, Azure Server and Power BI.
  • Experience in implementing OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
  • Worked on implementing data lake and responsible for data management in Data Lake.
  • Developed Ruby Script to map the data to the production environment.
  • Experience in analyzing data using Hive, HBase and custom Map Reduce program.
  • Developed Hive UDFs and Pig UDFs using Python script.
  • Experienced in working with IBM Data Science tool and responsible for injecting the processed data to IBM Data Science tool.
  • Strong Knowledge of clinical information systems/EMRs/EHRs and health care files like PGF, HL7, 837 and claims.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Worked on Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Responsible to configure the cluster in IBM cloud and maintain the number of nodes as per requirement
  • Developed Kafka consumer to consume data from Kafka topics.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Responsible for optimization of data-ingestion, data-processing, and data-analytics.
  • Expertise is developing Pyspark application which build connection between HDFS and HBase and allows data transfer between them.

Data Analyst

Hilton
05.2017 - 12.2018
  • Involved in complete project life cycle starting from design discussion to production deployment.
  • Worked closely with the business team to gather their requirements and new support features.
  • Developed a 16-node cluster in designing the Data Lake with the Cloudera Distribution.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented and configured High Availability Hadoop Cluster.
  • Installed and configured Hadoop Clusters with required services (HDFS, Hive, HBase, Spark, Zookeeper).
  • Developed Hive scripts to analyze data and PHI are categorized into different segments and promotions are offered to customer based on segments.
  • Extensive experience in writing Pig scripts to transform raw data into baseline data.
  • Developed UDFs in Java as and when necessary to use in Pig and HIVE queries.
  • Worked on Oozie workflow engine for job scheduling.
  • Created Hive tables, partitions and loaded the data to analyze using HiveQL queries.
  • Created different staging tables like ingestion tables and preparation tables in Hive environment.
  • Optimized Hive queries and used Hive on top of Spark engine.
  • Worked on Sequence files, Map side joins, Bucketing, Static and Dynamic Partitioning for Hive performance enhancement and storage improvement.
  • Experience in retrieving data from oracle using PHP and Java programming.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Created tables in HBase to store the variable data formats of data coming from different upstream sources.
  • Experience in managing and reviewing Hadoop log files.
  • Good understanding of ETL tools and how they can be applied in a Big Data environment.
  • Followed Agile Methodologies while working on the project.
  • Bug fixing and 24-7 production support for running the processes.

Java Developer

Edassist
09.2016 - 04.2017
  • Developed the system by following Agile methodology
  • Involved in the implementation of design using vital phases of the software development life cycle that includes Development,Testing, Implementation and Maintenance support.
  • Experience in Agile programming and accomplishing the tasks.
  • Used Ajax and JavaScript to handle asynchronous request, CSS to handle look and feel of the application.
  • Involved in design of class Diagrams, sequence Diagrams and Event Diagrams as a part of Documentation.
  • Developed the presentation layer using CSS and HTML taken from Bootstrap to Develop for multiple browsers including mobiles and tablets.
  • Extended standard action classes provided by the Struts framework for appropriately handling client requests.
  • Configured Struts tiles for reusing view components as an application of J2EE composite pattern.
  • Injection (DI/IoC) Developed code for obtaining bean references in Spring IoC framework.
  • Developed the application on Eclipse.
  • Representation from MVC model to oracle Relation data model with a SQL-Based schema.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.

Java Web Developer

HDFC
06.2013 - 12.2015
  • Involved in design and development phases of Software Development Life Cycle (SDLC).
  • Involved in designing UML Use case diagram, Class diagram, Sequence Diagrams and Rational Rose.
  • Building a revenue-generating java-based web application using JAVA/J2EE technologies.
  • Participating on development as well as integration of and enhancements to existing products.
  • Bug Fixing and supporting existing websites.
  • Used Agile methodology and SCRUM meeting to track, optimize and tailored features to client requirement.
  • User help tooltips implemented with Dojo Tooltip Widget with multiple custom colors.
  • Experience and Developed user interface using JSP, JSP Tag Libraries and Java Script to simplify the complexities of the application.
  • Experience and implemented Model View Controller (MVC) architecture using Jakarta Struts Frameworks at presentation tier.
  • Followed and Developed as Dojo based front end including forms and controls and programmed event handling.
  • Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP).
  • Developed various Enterprise Java Bean components to fulfill the business functionality.
  • Implemented and created Action Classes which route submittals to appropriate EJB components and render retrieved information.
  • Participating on analysis, design, build, unit test, deployment and support of the systems.

Education

Bachelor of Engineering - Computer Science

Osmania University

Master of Science - Computer Science

Texas A&M University

Skills

  • Hadoop/Big Data HDFS, MapReduce, Hive, HBase, Pig, Sqoop, Flume, Oozie, Cassandra, YARN,
  • Zookeeper, Spark SQL, Apache Spark, Impala, Apache Drill, Kafka, Elastic
  • MapReduce
  • Hadoop Frameworks Cloudera CDHs, Hortonworks HDPs, MAPR
  • Java & J2EE Technologies Core Java, Servlets, Java API, JDBC, Java Beans
  • IDE and Tools Eclipse, Net beans, Maven, ANT, Hue (Cloudera Specific), Toad, Sonar,
  • JDeveloper
  • Frameworks MVC, Structs, Hibernate, Spring
  • Programming Languages C, C, Java, Scala, Python, Linux shell
  • Web Technologies HTML, XML, DHTML, HTML5, CSS, JavaScript
  • Databases MYSQL, DB2, MS-SQL Server, Oracle
  • NO SQL Databases HBase, Cassandra, Mongo DB
  • Methodologies Agile Software Development, Waterfall
  • Version Control Systems Github, SVN, CVS, Clearcase
  • Operating Systems RedHat Linux, Ubuntu Linux, Windows XP/Vista/7/8/10, Sun Solaris, Suse Linux

Timeline

Data Engineer

Principal Financial Group
04.2022 - Current

Data Engineer

California Governors Office Of Emergency Services
08.2021 - 04.2022

Sr. BigData Engineer

DXC Technology
03.2020 - 07.2021

BigData Engineer

EPAM
01.2019 - 02.2020

Data Analyst

Hilton
05.2017 - 12.2018

Java Developer

Edassist
09.2016 - 04.2017

Java Web Developer

HDFC
06.2013 - 12.2015

Bachelor of Engineering - Computer Science

Osmania University

Master of Science - Computer Science

Texas A&M University
ALI SAFDAR NAIF