Summary
Overview
Work History
Education
Certification
Timeline
Generic

Samiha Zarin Latif

Data Engineer
Queens,New York

Summary

I am a highly skilled data engineer with 4+ years of experience, proficient in Microsoft Office Suite, SQL, Python, Java, Unix/Linux, Spark/PySpark, Hive, Sqoop, Oozie, Spark UI, Airflow and Ansible for automation. I also have strong expertise in database modeling, Elasticsearch/Kibana, Excel, and Tableau as tools for data analysis. Alongside my technical skills, I possess critical thinking, problem-solving, adaptability, organization, time management, risk management, and attention to detail. With a focus on delivering high-quality results, I excel in taking the lead and guiding team members, making me a valuable asset to your team. SKILL SUMMARY Strong analytical skills, broad knowledge of different domains such as HealthCare, Security, Retail and Financial Sector Vast knowledge and experience of Big Data Tools and Frameworks such as Hadoop and Spark Strong understanding of data modeling, data quality analysis, and data profiling Experience in building and maintaining data visualization dashboards using tools like Tableau and Kibana Experienced in programming languages like Python, Java and Scala; as well as scripting languages like UNIX/LINUX/Shell

Overview

4
4
years of professional experience
3
3
Certifications

Work History

Data Engineer

Microsoft
01.2022 - Current
  • Expertise in SQL DBs such as Oracle, MySQL and Teradata
  • In depth knowledge on working with NoSQL DB such as Hbase, MongoDB and AWS DynamoDB
  • Orchestrated jobs on Hadoop cluster, AWS and Mainframe systems using Cron jobs, Oozie and Python DAGs with Airflow
  • Worked with Spark Scala and/or PySpark to create ETL pipelines
  • Utilized MySQL db to query data and then used SQOOP to extract data from our RDBMS sources into HADOOP
  • Developed/captured/documented architectural best practices for building systems on AWS, Aided the data service team to hand in deliverables faster by automating back-end interactions between Elasticsearch, Kibana and Azure Event Hub using Ansible commands and Unix/Shell scripts
  • Gained hands-on experience in designing and maintaining adaptive and highly reliable data pipelines with Big Data tools and Cloud (Azure) storage capabilities
  • Increased debugging efficiency of data pipeline management processes by setting up a secondary data pipeline in Azure environment to fetch log data from ELK Stack and display into Event Hubs
  • Utilized SQL Stored Procedures and orchestrated them in different environments
  • Supported complex query optimization
  • Expertise working with Azure Databricks and using Spark and SQL to extract data for reporting
  • Experienced in installation, configuration, upgrades, capacity planning, performance tuning, backup, and recovery in managing clusters of SQL servers
  • Experienced with reporting tools like Tableau and Kibana to generate data-driven reports to improve communication with Stakeholders

Hadoop Developer

Citibank
01.2021 - 01.2022
  • Successfully utilized Spark core modules and developed generic Spark-Scala functions for big data transformations, aggregations and designing schema for HDFS tables to be translated into Tableau reports
  • Employed Sqoop to efficiently transfer data between Hadoop and relational databases, enhancing data integration capabilities
  • Orchestrated workflows using Oozie to ensure efficient task scheduling and execution
  • Wrote, debugged, and tuned transactional SQL, ETL, and stored procedures
  • Utilized Spark/PySpark and Hive for big data processing and analytics, extracting actionable insights for business decision-making
  • Deeply involved in writing complex PySpark scripts, Spark context and used multiple APIs, methods that support data frames
  • Explored Spark optimization techniques and PySpark modules for improving the performance of existing data transformation algorithms in Hadoop while monitoring status of data processes with YARN
  • Worked with Sqoop jobs to import the data from RDBMS and tuning Scoop scripts to move large datasets between Hive and RDBMS
  • Used Spark SQL with Python for creating data frames and to perform transformations on data frames like adding schema manually, casting, and joining data frames before storing them

Data Engineer

BCBS
11.2019 - 11.2020
  • Worked on various HDFS file formats such as AVRO,Sequence Files, Parquet Files and Textfiles
  • Utilized Apache Spark, to perform data migrations, data cleansing, and other operations on large datasets
  • Experience in building data dictionaries, functions, and synonyms for NoSQL (Elasticsearch)
  • Used Bitbucket to manage repositories, maintained the branching and build/release strategies utilizing GIT and Bitbucket
  • Utilized MySQL db to query data and then used SQOOP to extract data from our RDBMS sources into the HADOOP environment
  • Used cron jobs to orchestrate jobs in the cluster
  • In-depth work with converting existing MapReduce Jobs into Spark Scala jobs
  • Loaded data into Spark RDD and do in-memory data computation to generate outputs
  • Implemented Apache JMeter and conducted load testing and capacity planning to ensure that data systems can handle large volumes of data and traffic
  • Built shell scripts on Linux using shells like Bash, which allowed me to write scripts that automated repetitive tasks, data transformations and execute complex workflows
  • Wrote Java code to format XML files
  • Experienced working with Java J2EE, JDBC, ODBC, Java Eclipse, etc
  • Leveraged Redshift with services such as AWS Glue or AWS Data Pipeline to extract, transform, and load (ETL) data from various sources into Redshift, which ensured that the data is up-to-date and readily available for analysis
  • Worked with AWS and created EC2 Instances which was used to setup and compute resources for multiple data engineering workloads

Education

Master of Business Information Systems -

Monash University

Bachelor’s - Computer Science

Monash University

Certification

Solutions Architect Associate, Amazon Web Services (IP)

Timeline

Data Engineer

Microsoft
01.2022 - Current

Hadoop Developer

Citibank
01.2021 - 01.2022

Data Engineer

BCBS
11.2019 - 11.2020

Master of Business Information Systems -

Monash University

Bachelor’s - Computer Science

Monash University
Samiha Zarin LatifData Engineer