Summary

Overview

Work History

Education

Skills

Timeline

AFRAN HOSSAIN

Jackson Heights,NY

Summary

I have been working in the Data Engineering field for about 6 years. Over the years I have had a lot of exposure to many different tools, technologies and services and to my knowledge there are still many opportunities to learn and grow. I am a highly motivated & communicative individual and my superpower is that I love to learn anything new, especially in the Data Engineering field.

Overview

years of professional experience

Work History

PySpark Developer/Lead Software Engineer

Wells Fargo

11.2022 - Current

Created Logical Data Models with the team to define relationships between data entities and understand how data would get integrated from various source systems.
Utilized PySpark to read data from multiple CSV,Parquet, and JSON Files.
PySpark DataFrames & SQL aided me to clean, transform, and prepare raw data for analysis.
Python helped me develop a batch process which allowed my team to differentiate between customers.
Worked with AutoSys to schedule our PySpark jobs and ensure that they run on specific times which enhanced onshore and offshore team collaboration.
Migrated Raw data from various source systems into our Data Lake.
Wrote DDL scripts for different tables that we created in DEV,SIT & UAT environments.
Created Hive Tables on our putty server which allowed me to run Spark Jobs to complete the batch process.
Worked with an in house data control framework which helped me perform different levels of data validation.
Generated Technical Design Documents of the development we completed to review with the business team and make sure it lines up the there requirements.
Implemented and adopted cloud technologies and best practices for automation, configuration, monitoring and platform scalability.
Designed, coded, tested, debugged and documented programs using agile development practices.
Worked collaboratively with stakeholders to resolve technical roadblocks.

Hadoop Developer

Blue Cross Blue Shield

02.2022 - 11.2022

Worked with Hadoop, Spark, Scala, SQL, SparkSQL, HIVE, Spark Delta Lake, Eclipse, Oozie, Redshift and AWS S3.
Worked on Spark jobs as part of the ETL process that was already existing and which contained transformations and actions upon which raw data was manipulated.
Develop different components of system like Hadoop process that involves Map Reduce & Hive.
Import & Export of data from one server to other servers. Usually from Oracle serves into HDFS.
Converting raw text files into ORC files.
Ingested and loaded ORC files into HDFS then to Hive Tables.
Created, modified and executed .hql scripts for loading files.
Developed Spark Scripts by using Scala shell commands as per the requirement.
Generated EC2 instances on AWS under our project VPC.
Created S3 buckets and managed policies for the S3 Buckets.
Experienced with data warehouses like Redshift.
Completed Code Review with junior associates on spark scala jobs.
Utilized Jira to track and manage issues, defects and blockers.
Main objective for this project was to create a data pipeline to ingest clinical data for our vendors to utilize. Towards the end of the project I had to work with team to complete migration from on-prem to AWS.

Data Engineer

PayPal

12.2019 - 12.2021

Knowledge in Unix/Linux systems with scripting experience and building data pipelines
Responsible for Data Migration from on premise databases to Confidential Redshift and S3
Involved in creating Hive tables, loading data and writing hive queries that will run internally in mapReduce
Optimized Hive queries using methods and correct parameters and took advantage of technologies like Hadoop, YARN, Python, PySpark
Experience in implementing/ingesting Rest Based Web Services
Experienced in MDX Queries using SSAS
Worked in AWS ecosystem for development and deployment of custom Hadoop applications
Experienced in creating and scheduling SSIS jobs
Designed and developed stored procedures, queries, and views necessary to support SSRS reports
Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
Utilized Spark by using Python and Spark SQL for faster testing and processing of data
Responsible for integration of various data sources in RDBMS like Oracle, SQL Server
Executed Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and storage tools such as Hive or RDBMS
Used Apache Airflow for workflow management and generated workflows in python
Developed an SSRS project to create interactive and visually appealing reports for business stakeholders by leveraging SQL Server Reporting Services
Vast knowledge on designing Hadoop clusters on multiple EC2 instances in AWS
Have done POC on AWS Athena service.

Hadoop Developer

Sprint

09.2017 - 10.2019

Attended daily team meetings, I participated in a business meeting with the client to gather security requirements
Assisted the architect to analyze the existing system and future system prepared design blue pints and application flow documentation
Skilled in managing and evaluating Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data
Supervise data coming from different sources and application supported map reduce programs that are running on the cluster
Accountable for importing log files from various sources into HDFS using Flume
Wrote MapReduce programs to arrange and ingest the data
Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications
Managed onshore to offshore meetings for onshore to deliver work to offshore
Implemented SparkRDD transformations to map business analysis and apply actions on top of transformations
Developed Spark programs using Scala API to compare the performance of Spark with Hive and SQL
Applied Spark using Scala and SparkSQL for faster testing and processing of data
Utilized Apache PIG scripts to load data from and to store data into Hive
Supervised the workflows using Apache Oozie framework to automate tasks
Highly skilled with NoSQL database Hbase and creating Hbase tables to load large sets of semi structured data coming from various sources.

Education

Data Engineering - Computer Science Engineering

SUNY University of Buffalo

New York, NY

Data Engineering/Data Science

TechScope Bootcamp

New York, NY

High School Diploma -

Academy of Finance And Enterprise

New York, NY

06.2016

Skills

Programming: Scala,SQL,NoSQL,Python,Java
Big Data Components: Hadoop, Linux Shell Commands, Shell Scripts, Kafka, Spark, Apache NiFi, Delta Lake, Github, Apache Hive,Hbase, MongoDB, Intellij, AWS, CI/CD, Jira, Impala, Sqoop, Kerberos,DBFS,AutoSys
Team Collaboration

Strong Communication Skills & Interpersonal Skills
Performance Evaluation and Optimization

Timeline

PySpark Developer/Lead Software Engineer

Wells Fargo

11.2022 - Current

Hadoop Developer

Blue Cross Blue Shield

02.2022 - 11.2022

Data Engineer

PayPal

12.2019 - 12.2021

Hadoop Developer

Sprint

09.2017 - 10.2019

Data Engineering - Computer Science Engineering

SUNY University of Buffalo

Data Engineering/Data Science

TechScope Bootcamp

High School Diploma -

Academy of Finance And Enterprise

AFRAN HOSSAIN

Summary

Overview

Work History

PySpark Developer/Lead Software Engineer

Hadoop Developer

Data Engineer

Hadoop Developer

Education

Data Engineering - Computer Science Engineering

Data Engineering/Data Science

High School Diploma -

Skills

Timeline

PySpark Developer/Lead Software Engineer

Hadoop Developer

Data Engineer

Hadoop Developer

Data Engineering - Computer Science Engineering

Data Engineering/Data Science

High School Diploma -

Similar Profiles

PATRICIA M. SCHERERPATRICIA M. SCHERER

Andrew S. HollenbachAndrew S. Hollenbach

Laxmikanth ChittampallyLaxmikanth Chittampally

Mayank OberoiMayank Oberoi

RAMIRO PEREZRAMIRO PEREZ