Background includes data mining, warehousing and analytics, Proficient in cleansing and organizing data into new. Quality- driven and hardworking with excellent communication and project management skills.
Overview
7
7
years of professional experience
Work History
Data Engineer
Florida Blue
Jacksonville, United States
08.2019 - Current
Built a generic data ingestion framework to extract data from multiple sources like Oracle, delimited flat files, Parquet, and JSON,using it to build Hive/Impala tables.
Worked on migrating data to AWS S3 using NIFI and DISTCP.
Worked on buildings streams, tasks and procedures in Snowflake to ingest/transform and write data into new tables.
Responsible for design, development and maintenance of workflows to integrate Shell-actions, Java-actions, Sqoop-actions, Hive-actions and Spark-actions into Oozie workflow nodes to run data pipelines.
Used Python to parse XML files and create flat files from them.
Worked with Spark Data Frames, Datasets and RDDs using Python to transform and load data into Hive tables based on the requirements.
Extensively worked with Pyspark/Spark SQL for data cleansing and generating data frames and RDDS.
Analyzed SQL scripts for the design and implementation of solutions using PySpark.
Used Snowflake and Impala for low latency queries, visualization and faster-querying purposes.
Imported, exported and appended incremental data into HDFS using Sqoop or NIFI from various sources and ingested it into Hive/Snowflake tables.
Used HBase to support front end applications that require very low latency.
Built data quality framework using Java and Impala to run data rules that can generate reports and send emails of business-critical successful and failed job notifications to business users daily.
Determined the size of data and the level of computation required to process it and leverage Spark to transform data and compute aggregations.
Handled the design of multi-tenancy on our data platform to allow other teams to run their applications.
Worked on configuration and automation of workflows using Control-M and helped the production support teams to understand operational, scheduling and monitoring activities.
Created partition tables in Hive for better performance and faster querying.
Worked on debugging and performance tuning of Hive, Spark and Snowflake jobs.
Processed JSON files using Pyspark and ingested data into Hive tables.
Automated jobs for pulling or sending files from and to SFTP servers according to business requirements.
Business Data Analyst
MUFG Bank
Los Angeles, CA
11.2018 - 08.2019
Worked on SQL (Oracle, SQL-Server, PostgreSQL) for data analysis, data profiling and data mapping that includes SELECT queries, joins, aggregations and window functions.
Solid understanding of relational databases, ETL processes (file and database centric) and tools
Expertise in MS Excel & Microsoft Office (including Visio)
Performed gap analyses for both data and ETL processes.
Worked extensively with data governance team to maintain data models, metadata and data dictionaries according to enterprise standards.
Developed Python scripts to pre-process or clean data and generated flat files to build tables.
Communicated and coordinated with cross functional teams to understand and document business requirements.
Data Analyst
Ally Bank
CHARLOTTE, NORTH CAROLINA
02.2018 - 11.2018
Manipulated, cleaned and processed data using SQL, Python and Excel
Worked with some flat files like JSON, XML and performed data ingestion process.
Created dashboards using Tableau for data visualization.
Performed analysis on existing datasets and changed internal schema for performance.
Generated weekly, bi-weekly and monthly reports to be sent to business users and documented them too.
Performed data mapping from source system to target and participated in designing and development of application.
Data Analyst Intern
Extarc Software Solutions
HYDERABAD, TELANGANA
06.2013 - 12.2015
Worked with development teams, business users and source system teams to build data lineage of data lifecycle.
Collected and documented all metadata of existing tables and made sure that data types are consistent across the board.
Created Use Case specifications, business flow diagrams and sequence diagrams to facilitate the developers and other stakeholders to understand business process according to their perspective on possible al ternate scenarios.
Performed data mapping and data profiling from the source system to target and participated in the design and development of ETL application.
Education
Master of Science - Information Technology
University of Mary Hardin Baylor
Belton, TX
12.2017
Bachelor of Science - Biotechnology
Gokaraju Rangaraju
Hyderabad
07.2015
Skills
Big Data Ecosystem: HDFS, Map Reduce, Spark, Hive, Impala, HBase, Sqoop, Cloudera Hue, Kafka, Oozie, AWS S3, EC2, EMR, Glue and Athena
Languages: Python, HiveQL, SQL, PL/SQL, Snowflake
Database Systems: Oracle 11g/10g, MS SQL Server, IBM DB2, Green plum, PG Admin
NoSQL Database: HBase, Cassandra
Reporting Tools: Tableau, PowerBI
IDEs: Eclipse, STS, PyCharm
Scripting Tools: UNIX Shell Scripting, PERL
Operating System: Linux, Unix, Windows 7/Vista/XP/10
Scheduling Tools: Control M, Tidal Enterprise Scheduler, Crontab, Autosys
Customer Service Representative at Florida Blue - Blue Cross Blue Shield Of FloridaCustomer Service Representative at Florida Blue - Blue Cross Blue Shield Of Florida
Medicare Appeals & Grievances Coordinator at Florida Blue - Blue Cross Blue Shield Of FloridaMedicare Appeals & Grievances Coordinator at Florida Blue - Blue Cross Blue Shield Of Florida
Customer Service Advocate at Blue Cross Blue Shield of Florida, Florida BlueCustomer Service Advocate at Blue Cross Blue Shield of Florida, Florida Blue
Business Support Analyst II at Florida Blue - Blue Cross Blue Shield Of FloridaBusiness Support Analyst II at Florida Blue - Blue Cross Blue Shield Of Florida