Summary

Overview

Work History

Education

Skills

Personal Information

Timeline

Subhash Khanal

Irving,USA

Summary

Experienced Data Engineer accomplished in designing, developing and maintaining highly scalable, secure and reliable data structures. Accustomed to working closely with system architects, software architects and design analysts to understand business or industry requirements to develop comprehensive data models. Proficient at developing database architectural strategies at the modeling, design and implementation stages.

Overview

years of professional experience

Work History

Data Solutions Architect

Bank of America

Plano, United States

07.2023 - Current

Engineered end-to-end data solutions by leveraging Big Data technologies, particularly Hadoop ecosystem, to process, store, and analyze MASSIVE DATASETS, resulting in a 25% increase in processing speed and 30% improvement in scalability
Enhanced data processing workflows by 50% by crafting and deploying robust data pipelines leveraging tools such as Apache Spark and Apache Flink, leading to enhanced real-time analytics capabilities
Led cross-functional teams to gather requirements, resulting in a 50% increase in project efficiency
Reduced cluster downtime by 6 hours per month through the application of performance tuning, resource management, and troubleshooting methods, resulting in a savings of $350,000 in maintenance expenses
Increased data security by implementing encryption protocols, leading to a 30% reduction in data breaches and ensuring compliance with 100% of data governance policies to protect over 10,000 sensitive information records
Implemented data profiling processes that reduced the time needed for analysis by 25%, enabling faster identification of trends and patterns crucial for strategic business decisions
Trained 4 teams on Big Data tools, resulting in a 50% increase in data-driven decision-making within the organization
Implemented new data engineering solutions by continuously enhancing skills in emerging technologies in the Big Data landscape, resulting in improved performance and scalability
Designed and implemented a scalable data architecture solution, resulting in a 30% increase in overall system performance.

Data Engineer

CVS Health

Austin, United States

01.2020 - 08.2023

Architected and implemented scalable distributed data solutions using AWS services, ETL processes, and Data Lake applications while ensuring data quality, efficient storage, and analytics based on business user requirements
Collaborated with architects to translate functional and technical requirements into detailed architecture and design, building scalable distributed data solutions using AWS services
Validated transactional and profile data from RDBMS, transforming and loading it into Data Lake using AWS Cloud Services and automating S3 file system processes
Developed and tested ETL processes in AWS Glue, migrating campaign data from external sources like S3 and various file formats into AWS Redshift
Utilized Python and SQL scripts to import and export structured data between relational databases, S3, and AWS RDS using Spark, EC2, and EMR clusters
Implemented end-to-end Apache Airflow design and development, facilitating communication between middleware and EBI teams and executing critical actions
Developed monitoring reports and dashboards for Spark jobs, leveraging text analytics and in-memory computing capabilities like Apache Spark with Python, and troubleshooting production-level issues
Utilized Parquet file format and HBase tables for efficient storage and performance, working with NoSQL databases like HBase
Managed data from multiple sources, maintaining HDFS and loading structured/unstructured data for diverse processing needs
Developed RESTful API using Python to track open-source GitHub projects and implemented machine learning methods using Spark, Python, Hadoop, and HBase
Designed data analysis pipelines with Python, leveraging AWS services like S3, EC2, and Elastic MapReduce for efficient processing and storage
Applied advanced text analytics using Apache Spark and developed analytic systems with Python and Scala-based ML Libraries.

Data Engineer

Citi

Irving, United States

05.2017 - 01.2020

Developed and deployed Spark applications using Scala for Hadoop transitions, implemented Microservices architecture with Spring Boot, managed cloud-based storage, and processing in AWS HDFS, and collaborated on Hadoop-based Data Lake initiatives
Also, streamlined data ingestion pipelines monitored Hadoop cluster operations, and migrated MapReduce programs to Spark transformations, utilizing Scala and Python for analysis and optimization, ultimately enhancing data processing efficiency across organization
Developed Spark applications using Scala to facilitate Hadoop transitions, enhancing performance on the Hortonworks Data Platform
Utilized Microservices architecture with Spring Boot-based services, building and deploying enterprise-level software products
Managed cloud-based storage and processing in AWS HDFS, deploying applications using ELBs and EC2 instances
Created Hive tables and read parquet data using Scala API, Spark, and Spark SQL for faster processing and testing
Analyzed SQL scripts and designed solutions by implementing Spark programs using Spark optimizing data processing tasks
Collaborated with the Big Data Architecture team to establish a Hadoop-based Data Lake for organization-wide analytics initiatives
Extracted real-time data feeds with Spark Streaming, converting them to RDDs and processing data as Data Frames in HDFS
Developed data pipelines for ingesting customer behavioral data into HDFS using Sqoop, Pig, and Java MapReduce
Aggregated log data with Apache Flume, staging data in HDFS for further analysis and processing
Managed Hadoop cluster operations, including installation, upgrades, capacity planning, and troubleshooting MapReduce job execution issues
Utilized Scala and Python for interactive and batch analysis, developing Spark jobs for efficient data processing tasks.

Education

Bachelor of Science (BS) - Statistics

The University of Texas at Austin

Associate's Degree - Computer Science

Dallas College

Skills

Python
Data Mining
AWS
Data Warehousing
ETL
Docker
Hadoop
Git
Hive
NoSQL
MapReduce
Machine Learning
Scala
Spark
Pytorch
SQL

Kafka
Azure
Deep Learning
Kubernetes
Linux
Pig
Java
Tensorflow
Statistical Analysis
Apache Storm
Apache Flink
Data Modeling
Data Visualization
Predictive Analytics
Data Pipelines
Shell Scripting

Personal Information

Title: Big Data Engineer

Timeline

Data Solutions Architect

Bank of America

07.2023 - Current

Data Engineer

CVS Health

01.2020 - 08.2023

Data Engineer

Citi

05.2017 - 01.2020

Bachelor of Science (BS) - Statistics

The University of Texas at Austin

Associate's Degree - Computer Science

Dallas College

Subhash Khanal

Summary

Overview

Work History

Data Solutions Architect

Data Engineer

Data Engineer

Education

Bachelor of Science (BS) - Statistics

Associate's Degree - Computer Science

Skills

Personal Information

Timeline

Data Solutions Architect

Data Engineer

Data Engineer

Bachelor of Science (BS) - Statistics

Associate's Degree - Computer Science

Similar Profiles

Suhas VenkateshSuhas Venkatesh

Sriharsha PasupuletiSriharsha Pasupuleti

Girish KumarGirish Kumar

Shamraiz ShehzadShamraiz Shehzad