Summary

Overview

Work History

Education

Certification

Timeline

Samiha Zarin Latif

Data Engineer

Queens,New York

Summary

I am a highly skilled data engineer with 4+ years of experience, proficient in Microsoft Office Suite, SQL, Python, Java, Unix/Linux, Spark/PySpark, Hive, Sqoop, Oozie, Spark UI, Airflow and Ansible for automation. I also have strong expertise in database modeling, Elasticsearch/Kibana, Excel, and Tableau as tools for data analysis. Alongside my technical skills, I possess critical thinking, problem-solving, adaptability, organization, time management, risk management, and attention to detail. With a focus on delivering high-quality results, I excel in taking the lead and guiding team members, making me a valuable asset to your team. SKILL SUMMARY Strong analytical skills, broad knowledge of different domains such as HealthCare, Security, Retail and Financial Sector Vast knowledge and experience of Big Data Tools and Frameworks such as Hadoop and Spark Strong understanding of data modeling, data quality analysis, and data profiling Experience in building and maintaining data visualization dashboards using tools like Tableau and Kibana Experienced in programming languages like Python, Java and Scala; as well as scripting languages like UNIX/LINUX/Shell

Overview

years of professional experience

Certifications

Work History

Data Engineer

Microsoft

01.2022 - Current

Expertise in SQL DBs such as Oracle, MySQL and Teradata
In depth knowledge on working with NoSQL DB such as Hbase, MongoDB and AWS DynamoDB
Orchestrated jobs on Hadoop cluster, AWS and Mainframe systems using Cron jobs, Oozie and Python DAGs with Airflow
Worked with Spark Scala and/or PySpark to create ETL pipelines
Utilized MySQL db to query data and then used SQOOP to extract data from our RDBMS sources into HADOOP
Developed/captured/documented architectural best practices for building systems on AWS, Aided the data service team to hand in deliverables faster by automating back-end interactions between Elasticsearch, Kibana and Azure Event Hub using Ansible commands and Unix/Shell scripts
Gained hands-on experience in designing and maintaining adaptive and highly reliable data pipelines with Big Data tools and Cloud (Azure) storage capabilities
Increased debugging efficiency of data pipeline management processes by setting up a secondary data pipeline in Azure environment to fetch log data from ELK Stack and display into Event Hubs
Utilized SQL Stored Procedures and orchestrated them in different environments
Supported complex query optimization
Expertise working with Azure Databricks and using Spark and SQL to extract data for reporting
Experienced in installation, configuration, upgrades, capacity planning, performance tuning, backup, and recovery in managing clusters of SQL servers
Experienced with reporting tools like Tableau and Kibana to generate data-driven reports to improve communication with Stakeholders

Hadoop Developer

Citibank

01.2021 - 01.2022

Successfully utilized Spark core modules and developed generic Spark-Scala functions for big data transformations, aggregations and designing schema for HDFS tables to be translated into Tableau reports
Employed Sqoop to efficiently transfer data between Hadoop and relational databases, enhancing data integration capabilities
Orchestrated workflows using Oozie to ensure efficient task scheduling and execution
Wrote, debugged, and tuned transactional SQL, ETL, and stored procedures
Utilized Spark/PySpark and Hive for big data processing and analytics, extracting actionable insights for business decision-making
Deeply involved in writing complex PySpark scripts, Spark context and used multiple APIs, methods that support data frames
Explored Spark optimization techniques and PySpark modules for improving the performance of existing data transformation algorithms in Hadoop while monitoring status of data processes with YARN
Worked with Sqoop jobs to import the data from RDBMS and tuning Scoop scripts to move large datasets between Hive and RDBMS
Used Spark SQL with Python for creating data frames and to perform transformations on data frames like adding schema manually, casting, and joining data frames before storing them

Data Engineer

BCBS

11.2019 - 11.2020

Worked on various HDFS file formats such as AVRO,Sequence Files, Parquet Files and Textfiles
Utilized Apache Spark, to perform data migrations, data cleansing, and other operations on large datasets
Experience in building data dictionaries, functions, and synonyms for NoSQL (Elasticsearch)
Used Bitbucket to manage repositories, maintained the branching and build/release strategies utilizing GIT and Bitbucket
Utilized MySQL db to query data and then used SQOOP to extract data from our RDBMS sources into the HADOOP environment
Used cron jobs to orchestrate jobs in the cluster
In-depth work with converting existing MapReduce Jobs into Spark Scala jobs
Loaded data into Spark RDD and do in-memory data computation to generate outputs
Implemented Apache JMeter and conducted load testing and capacity planning to ensure that data systems can handle large volumes of data and traffic
Built shell scripts on Linux using shells like Bash, which allowed me to write scripts that automated repetitive tasks, data transformations and execute complex workflows
Wrote Java code to format XML files
Experienced working with Java J2EE, JDBC, ODBC, Java Eclipse, etc
Leveraged Redshift with services such as AWS Glue or AWS Data Pipeline to extract, transform, and load (ETL) data from various sources into Redshift, which ensured that the data is up-to-date and readily available for analysis
Worked with AWS and created EC2 Instances which was used to setup and compute resources for multiple data engineering workloads

Education

Master of Business Information Systems -

Monash University

Bachelor’s - Computer Science

Monash University

Certification

Solutions Architect Associate, Amazon Web Services (IP)

Timeline

Data Engineer

Microsoft

01.2022 - Current

Hadoop Developer

Citibank

01.2021 - 01.2022

Data Engineer

BCBS

11.2019 - 11.2020

Master of Business Information Systems -

Monash University

Bachelor’s - Computer Science

Monash University

Samiha Zarin Latif

Summary

Overview

Work History

Data Engineer

Hadoop Developer

Data Engineer

Education

Master of Business Information Systems -

Bachelor’s - Computer Science

Certification

Timeline

Data Engineer

Hadoop Developer

Data Engineer

Master of Business Information Systems -

Bachelor’s - Computer Science

Similar Profiles

MARIANNE ONGKEWINMARIANNE ONGKEWIN

Rafael Klein GianellaRafael Klein Gianella

Alexis CherryAlexis Cherry

Taylor NixTaylor Nix

Sadullo (Sasha) AbdulkhairovSadullo (Sasha) Abdulkhairov