Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Hassan Qureshi

Hassan Qureshi

Tampa,FL

Summary

Experienced with designing and optimizing data pipelines to ensure seamless data flow. Utilizes advanced SQL and Python skills to create and maintain robust data architectures. Track record of implementing scalable solutions that enhance data integrity and support informed decision-making.

Overview

12
12
years of professional experience
1
1
Certification

Work History

Principal Hadoop Architect

Citibank
Tampa, FL
11.2019 - Current
  • Java Spark pre certification project is processing large amounts of files for clarity of checking data.
  • Spark code passes the data to limited files with mortgage data before going into genesis application.
  • Java/Angular code developed to read data from hadoop cluster and make it appear in internal website DQV AL
  • Worked on another project to ingest data in hadoop cluster. Reading data from sql server and sending query data to Kafka server then use Olympus framework to send data in Hive/Elastic Search/Hbase
  • Developed application using Java Spring to read data from hadoop and oracle tables and pass it to front end using REST API
  • Experienced in Cloudera/Hortonworks stack, specifically Hive LLAP, YARN, and HDFS.
  • Performance Tuning,Storage Rationalization,Lifecycle Management in Cloudera cluster
  • Scheduled pyspark code jobs to run in specific business days
  • Developed internal website using angular component to read files and process them using 2 node hadoop cluster
  • Worked in Abinitio Development and Prod support
  • Provided support to users on Abinitio Mhub, Express IT, Query IT, Control Centre and DCS
  • Developed and perform changes in files to enable data load in Mhub
  • Worked with Atlas and Ranger.
  • Performed AUX upload, File ingestions in Abinitio to load excel as dataset in Mhub
  • Developed Oracle sql scripts to insert data and write queries to read data for presenting in UI
  • Technology tools: Visual studio, TFS, MS Word

Lead Data Engineer

TD bank
Toronto, ON
07.2019 - 10.2019
  • Role Based project was to control elevated access throughout out production support platform. The project was to allow limited access to support staff based on job duties. Python and bash scripts were developed to diagnostic commands without using root
  • Developed python code in linux machine to create summary of root and elevated account usage
  • Created bash script for running specific set of commands that are used by support staff
  • Assisted support staff in operations to resolve most common issues using script
  • Worked with Atlas and Ranger.
  • Created bash script for tenant space directory and Hive sql DDL
  • Performed sentry roles and facl permissions on tenant space directory
  • Used Cloudera Manager to define roles and permissions for groups
  • Technology tools: Hadoop Python Unix
  • Key achievements: Successfully implemented pyspark code to create permission for users

Lead Data Engineer

Morneau Shepell
Markham, ON
07.2018 - 06.2019
  • Morneau Shepell has different companies accounts that provide services for job benefits, absence, disability, stress support. My goal was to create a data lake importing data from different line of businesses using datastage and sqoop. Project was to study to find correlation between all different database events. Specifically mapping employees of each organization using spark and create machine learning algorithmic models.
  • Fresh installed 10 node Hadoop HDP 2.6 distribution Cluster
  • Install Hadoop components such as Spark, Yarn, Hbase and maintain replication between servers
  • Complete setup and scaling, upgrading, migration of Hadoop cluster from scratch
  • Installed and configured IBM Datastage 11.5 and used it to import data from sql server to hdfs
  • Created ML models and pyspark jobs to study correlation between different database
  • Used sqoop to import sql tables into Hive HQL
  • Understanding of basics troubleshooting, system capacity, basics of memory, OS and networks
  • Experience in Y arn and Y arn scheduler and Zookeeper
  • Technology tools: Hadoop admin, Python Unix Datastage
  • Key achievements: Successfully installed Hadoop cluster and implementation of ETL using Datastage

Lead Data Engineer

TD Bank
Toronto, ON
04.2016 - 06.2018
  • Role & contribution: HDP 2.3 distribution for development Cluster
  • Hadoop eco systems, Java, Linux, Hive SQL, Map reduce to process data Contribution
  • Writing Map reduce Java for processing xmls
  • Proficient performed coding in Java Spring, Junit in bitbucket
  • Developed and maintain ETL jobs for datawarehouse
  • Monitored Hadoop cluster performance in Cloudera version 5.7
  • Experience in Jenkins
  • Experience in unix scripting to process files in lower environment
  • Worked in Agile environment and GIT
  • Experience in building Hadoop cluster in 10 node installation and Providing production support for cluster maintenance
  • TD bank provides banking services to millions of customers including credit card services. ECRR project focus on processing the banking transactions to score customer ranking. ECRR application uses files from mainframe in Xml and convert them into husky values.

Data Engineer

CIBC Bank
Toronto, ON
02.2014 - 03.2016
  • Develop Hive ETL code based on business requirement document
  • Test and conduct analysis on hive ETL
  • Performed fix defects in previous promoted hive code
  • Followed business plan to create hive ETL in Hadoop
  • Used Cloudera 5.5 on prem cluster
  • Modified existing data model by adding additional summary tables to greatly reduce the runtime of dashboards and reports
  • Used beeline, Impala and kerberos to write hive queries
  • Used Hadoop tools Hue, beeline, Impala, Hive
  • Sqoop, Kerberos to write hive ETL code
  • Performed checking in of code with Team
  • Foundation Server using Eclipse and Visual Studio
  • Used Toad for Oracle to view data in Oracle db

Education

Bachelors of Science (Honors) -

University of Guelph
Canada
01.2008

Skills

  • Data Engineer
  • 10 plus years of experience
  • Created unix script, pyspark code to create L3 transformed tables
  • Developed Hive DDL scripts, transformations in Abinitio and pyspark
  • Developed Oracle sql scripts to insert data and write queries to read data for presenting in UI
  • Domain: Banking
  • Programming Languages: Java, python, unix scrip5ng, oracle,sql, Abinitio
  • Unix/Windows
  • OperaEng System / ERP Version: Unix / windows
  • Tools/DB/ Packages / Framework / ERP Components
  • Hardware Platorms : Hands-on experience on major components in Hadoop Ecosystem including Cloudera Hive SQL,HBase, HBase-Hive Integration, PIG, Sqoop, IBM Datastage Flume & knowledge of Mapper/Reducer/ HDFS Framework
  • Hardware platform: Intel Series
  • ETL development
  • Data modeling
  • Data pipeline design
  • Data warehousing
  • SQL expertise
  • Hadoop ecosystem
  • Problem-solving
  • Organizational skills
  • Amazon redshift

Certification

  • Professional Certification: Databricks Certified Spark Developer 3.0
  • Professional Certification: Linux Foundation Certified IT Associate LFCA
  • Professional Certification: Kubernetes and Cloud Native Associate KCNA

Timeline

Principal Hadoop Architect

Citibank
11.2019 - Current

Lead Data Engineer

TD bank
07.2019 - 10.2019

Lead Data Engineer

Morneau Shepell
07.2018 - 06.2019

Lead Data Engineer

TD Bank
04.2016 - 06.2018

Data Engineer

CIBC Bank
02.2014 - 03.2016

Bachelors of Science (Honors) -

University of Guelph
Hassan Qureshi