Summary
Overview
Work History
Education
Skills
Timeline
Generic

Manisha Basani

Jacksonville,Florida

Summary

Background includes data mining, warehousing and analytics, Proficient in cleansing and organizing data into new. Quality- driven and hardworking with excellent communication and project management skills.

Overview

7
7
years of professional experience

Work History

Data Engineer

Florida Blue
Jacksonville, United States
08.2019 - Current
  • Built a generic data ingestion framework to extract data from multiple sources like Oracle, delimited flat files, Parquet, and JSON,using it to build Hive/Impala tables.
  • Worked on migrating data to AWS S3 using NIFI and DISTCP.
  • Worked on buildings streams, tasks and procedures in Snowflake to ingest/transform and write data into new tables.
  • Responsible for design, development and maintenance of workflows to integrate Shell-actions, Java-actions, Sqoop-actions, Hive-actions and Spark-actions into Oozie workflow nodes to run data pipelines.
  • Used Python to parse XML files and create flat files from them.
  • Worked with Spark Data Frames, Datasets and RDDs using Python to transform and load data into Hive tables based on the requirements.
  • Extensively worked with Pyspark/Spark SQL for data cleansing and generating data frames and RDDS.
  • Analyzed SQL scripts for the design and implementation of solutions using PySpark.
  • Used Snowflake and Impala for low latency queries, visualization and faster-querying purposes.
  • Imported, exported and appended incremental data into HDFS using Sqoop or NIFI from various sources and ingested it into Hive/Snowflake tables.
  • Used HBase to support front end applications that require very low latency.
  • Built data quality framework using Java and Impala to run data rules that can generate reports and send emails of business-critical successful and failed job notifications to business users daily.
  • Determined the size of data and the level of computation required to process it and leverage Spark to transform data and compute aggregations.
  • Handled the design of multi-tenancy on our data platform to allow other teams to run their applications.
  • Worked on configuration and automation of workflows using Control-M and helped the production support teams to understand operational, scheduling and monitoring activities.
  • Created partition tables in Hive for better performance and faster querying.
  • Worked on debugging and performance tuning of Hive, Spark and Snowflake jobs.
  • Processed JSON files using Pyspark and ingested data into Hive tables.
  • Automated jobs for pulling or sending files from and to SFTP servers according to business requirements.

Business Data Analyst

MUFG Bank
Los Angeles, CA
11.2018 - 08.2019
  • Worked on SQL (Oracle, SQL-Server, PostgreSQL) for data analysis, data profiling and data mapping that includes SELECT queries, joins, aggregations and window functions.
  • Solid understanding of relational databases, ETL processes (file and database centric) and tools
  • Expertise in MS Excel & Microsoft Office (including Visio)
  • Performed gap analyses for both data and ETL processes.
  • Worked extensively with data governance team to maintain data models, metadata and data dictionaries according to enterprise standards.
  • Developed Python scripts to pre-process or clean data and generated flat files to build tables.
  • Communicated and coordinated with cross functional teams to understand and document business requirements.

Data Analyst

Ally Bank
CHARLOTTE, NORTH CAROLINA
02.2018 - 11.2018
  • Manipulated, cleaned and processed data using SQL, Python and Excel
  • Worked with some flat files like JSON, XML and performed data ingestion process.
  • Created dashboards using Tableau for data visualization.
  • Performed analysis on existing datasets and changed internal schema for performance.
  • Generated weekly, bi-weekly and monthly reports to be sent to business users and documented them too.
  • Performed data mapping from source system to target and participated in designing and development of application.

Data Analyst Intern

Extarc Software Solutions
HYDERABAD, TELANGANA
06.2013 - 12.2015
  • Worked with development teams, business users and source system teams to build data lineage of data lifecycle.
  • Collected and documented all metadata of existing tables and made sure that data types are consistent across the board.
  • Created Use Case specifications, business flow diagrams and sequence diagrams to facilitate the developers and other stakeholders to understand business process according to their perspective on possible al ternate scenarios.
  • Performed data mapping and data profiling from the source system to target and participated in the design and development of ETL application.

Education

Master of Science - Information Technology

University of Mary Hardin Baylor
Belton, TX
12.2017

Bachelor of Science - Biotechnology

Gokaraju Rangaraju
Hyderabad
07.2015

Skills

  • Big Data Ecosystem: HDFS, Map Reduce, Spark, Hive, Impala, HBase, Sqoop, Cloudera Hue, Kafka, Oozie, AWS S3, EC2, EMR, Glue and Athena
  • Languages: Python, HiveQL, SQL, PL/SQL, Snowflake
  • Database Systems: Oracle 11g/10g, MS SQL Server, IBM DB2, Green plum, PG Admin
  • NoSQL Database: HBase, Cassandra
  • Reporting Tools: Tableau, PowerBI
  • IDEs: Eclipse, STS, PyCharm
  • Scripting Tools: UNIX Shell Scripting, PERL
  • Operating System: Linux, Unix, Windows 7/Vista/XP/10
  • Scheduling Tools: Control M, Tidal Enterprise Scheduler, Crontab, Autosys

Timeline

Data Engineer

Florida Blue
08.2019 - Current

Business Data Analyst

MUFG Bank
11.2018 - 08.2019

Data Analyst

Ally Bank
02.2018 - 11.2018

Data Analyst Intern

Extarc Software Solutions
06.2013 - 12.2015

Master of Science - Information Technology

University of Mary Hardin Baylor

Bachelor of Science - Biotechnology

Gokaraju Rangaraju
Manisha Basani