Summary
Overview
Work History
Education
Skills
Timeline
Generic

AFRAN HOSSAIN

Jackson Heights,NY

Summary

I have been working in the Data Engineering field for about 6 years. Over the years I have had a lot of exposure to many different tools, technologies and services and to my knowledge there are still many opportunities to learn and grow. I am a highly motivated & communicative individual and my superpower is that I love to learn anything new, especially in the Data Engineering field.

Overview

6
6
years of professional experience

Work History

PySpark Developer/Lead Software Engineer

Wells Fargo
11.2022 - Current
  • Created Logical Data Models with the team to define relationships between data entities and understand how data would get integrated from various source systems.
  • Utilized PySpark to read data from multiple CSV,Parquet, and JSON Files.
  • PySpark DataFrames & SQL aided me to clean, transform, and prepare raw data for analysis.
  • Python helped me develop a batch process which allowed my team to differentiate between customers.
  • Worked with AutoSys to schedule our PySpark jobs and ensure that they run on specific times which enhanced onshore and offshore team collaboration.
  • Migrated Raw data from various source systems into our Data Lake.
  • Wrote DDL scripts for different tables that we created in DEV,SIT & UAT environments.
  • Created Hive Tables on our putty server which allowed me to run Spark Jobs to complete the batch process.
  • Worked with an in house data control framework which helped me perform different levels of data validation.
  • Generated Technical Design Documents of the development we completed to review with the business team and make sure it lines up the there requirements.
  • Implemented and adopted cloud technologies and best practices for automation, configuration, monitoring and platform scalability.
  • Designed, coded, tested, debugged and documented programs using agile development practices.
  • Worked collaboratively with stakeholders to resolve technical roadblocks.



Hadoop Developer

Blue Cross Blue Shield
02.2022 - 11.2022
  • Worked with Hadoop, Spark, Scala, SQL, SparkSQL, HIVE, Spark Delta Lake, Eclipse, Oozie, Redshift and AWS S3.
  • Worked on Spark jobs as part of the ETL process that was already existing and which contained transformations and actions upon which raw data was manipulated.
  • Develop different components of system like Hadoop process that involves Map Reduce & Hive.
  • Import & Export of data from one server to other servers. Usually from Oracle serves into HDFS.
  • Converting raw text files into ORC files.
  • Ingested and loaded ORC files into HDFS then to Hive Tables.
  • Created, modified and executed .hql scripts for loading files.
  • Developed Spark Scripts by using Scala shell commands as per the requirement.
  • Generated EC2 instances on AWS under our project VPC.
  • Created S3 buckets and managed policies for the S3 Buckets.
  • Experienced with data warehouses like Redshift.
  • Completed Code Review with junior associates on spark scala jobs.
  • Utilized Jira to track and manage issues, defects and blockers.
  • Main objective for this project was to create a data pipeline to ingest clinical data for our vendors to utilize. Towards the end of the project I had to work with team to complete migration from on-prem to AWS.

Data Engineer

PayPal
12.2019 - 12.2021
  • Knowledge in Unix/Linux systems with scripting experience and building data pipelines
  • Responsible for Data Migration from on premise databases to Confidential Redshift and S3
  • Involved in creating Hive tables, loading data and writing hive queries that will run internally in mapReduce
  • Optimized Hive queries using methods and correct parameters and took advantage of technologies like Hadoop, YARN, Python, PySpark
  • Experience in implementing/ingesting Rest Based Web Services
  • Experienced in MDX Queries using SSAS
  • Worked in AWS ecosystem for development and deployment of custom Hadoop applications
  • Experienced in creating and scheduling SSIS jobs
  • Designed and developed stored procedures, queries, and views necessary to support SSRS reports
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
  • Utilized Spark by using Python and Spark SQL for faster testing and processing of data
  • Responsible for integration of various data sources in RDBMS like Oracle, SQL Server
  • Executed Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and storage tools such as Hive or RDBMS
  • Used Apache Airflow for workflow management and generated workflows in python
  • Developed an SSRS project to create interactive and visually appealing reports for business stakeholders by leveraging SQL Server Reporting Services
  • Vast knowledge on designing Hadoop clusters on multiple EC2 instances in AWS
  • Have done POC on AWS Athena service.

Hadoop Developer

Sprint
09.2017 - 10.2019
  • Attended daily team meetings, I participated in a business meeting with the client to gather security requirements
  • Assisted the architect to analyze the existing system and future system prepared design blue pints and application flow documentation
  • Skilled in managing and evaluating Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data
  • Supervise data coming from different sources and application supported map reduce programs that are running on the cluster
  • Accountable for importing log files from various sources into HDFS using Flume
  • Wrote MapReduce programs to arrange and ingest the data
  • Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications
  • Managed onshore to offshore meetings for onshore to deliver work to offshore
  • Implemented SparkRDD transformations to map business analysis and apply actions on top of transformations
  • Developed Spark programs using Scala API to compare the performance of Spark with Hive and SQL
  • Applied Spark using Scala and SparkSQL for faster testing and processing of data
  • Utilized Apache PIG scripts to load data from and to store data into Hive
  • Supervised the workflows using Apache Oozie framework to automate tasks
  • Highly skilled with NoSQL database Hbase and creating Hbase tables to load large sets of semi structured data coming from various sources.

Education

Data Engineering - Computer Science Engineering

SUNY University of Buffalo
New York, NY

Data Engineering/Data Science

TechScope Bootcamp
New York, NY

High School Diploma -

Academy of Finance And Enterprise
New York, NY
06.2016

Skills

  • Programming: Scala,SQL,NoSQL,Python,Java
  • Big Data Components: Hadoop, Linux Shell Commands, Shell Scripts, Kafka, Spark, Apache NiFi, Delta Lake, Github, Apache Hive,Hbase, MongoDB, Intellij, AWS, CI/CD, Jira, Impala, Sqoop, Kerberos,DBFS,AutoSys
  • Team Collaboration
  • Strong Communication Skills & Interpersonal Skills
  • Performance Evaluation and Optimization

Timeline

PySpark Developer/Lead Software Engineer

Wells Fargo
11.2022 - Current

Hadoop Developer

Blue Cross Blue Shield
02.2022 - 11.2022

Data Engineer

PayPal
12.2019 - 12.2021

Hadoop Developer

Sprint
09.2017 - 10.2019

Data Engineering - Computer Science Engineering

SUNY University of Buffalo

Data Engineering/Data Science

TechScope Bootcamp

High School Diploma -

Academy of Finance And Enterprise
AFRAN HOSSAIN