Summary
Overview
Work History
Education
Skills
Personal Information
Timeline
Generic

Subhash Khanal

Irving,USA

Summary

Experienced Data Engineer accomplished in designing, developing and maintaining highly scalable, secure and reliable data structures. Accustomed to working closely with system architects, software architects and design analysts to understand business or industry requirements to develop comprehensive data models. Proficient at developing database architectural strategies at the modeling, design and implementation stages.

Overview

7
7
years of professional experience

Work History

Data Solutions Architect

Bank of America
Plano, United States
07.2023 - Current
  • Engineered end-to-end data solutions by leveraging Big Data technologies, particularly Hadoop ecosystem, to process, store, and analyze MASSIVE DATASETS, resulting in a 25% increase in processing speed and 30% improvement in scalability
  • Enhanced data processing workflows by 50% by crafting and deploying robust data pipelines leveraging tools such as Apache Spark and Apache Flink, leading to enhanced real-time analytics capabilities
  • Led cross-functional teams to gather requirements, resulting in a 50% increase in project efficiency
  • Reduced cluster downtime by 6 hours per month through the application of performance tuning, resource management, and troubleshooting methods, resulting in a savings of $350,000 in maintenance expenses
  • Increased data security by implementing encryption protocols, leading to a 30% reduction in data breaches and ensuring compliance with 100% of data governance policies to protect over 10,000 sensitive information records
  • Implemented data profiling processes that reduced the time needed for analysis by 25%, enabling faster identification of trends and patterns crucial for strategic business decisions
  • Trained 4 teams on Big Data tools, resulting in a 50% increase in data-driven decision-making within the organization
  • Implemented new data engineering solutions by continuously enhancing skills in emerging technologies in the Big Data landscape, resulting in improved performance and scalability
  • Designed and implemented a scalable data architecture solution, resulting in a 30% increase in overall system performance.

Data Engineer

CVS Health
Austin, United States
01.2020 - 08.2023
  • Architected and implemented scalable distributed data solutions using AWS services, ETL processes, and Data Lake applications while ensuring data quality, efficient storage, and analytics based on business user requirements
  • Collaborated with architects to translate functional and technical requirements into detailed architecture and design, building scalable distributed data solutions using AWS services
  • Validated transactional and profile data from RDBMS, transforming and loading it into Data Lake using AWS Cloud Services and automating S3 file system processes
  • Developed and tested ETL processes in AWS Glue, migrating campaign data from external sources like S3 and various file formats into AWS Redshift
  • Utilized Python and SQL scripts to import and export structured data between relational databases, S3, and AWS RDS using Spark, EC2, and EMR clusters
  • Implemented end-to-end Apache Airflow design and development, facilitating communication between middleware and EBI teams and executing critical actions
  • Developed monitoring reports and dashboards for Spark jobs, leveraging text analytics and in-memory computing capabilities like Apache Spark with Python, and troubleshooting production-level issues
  • Utilized Parquet file format and HBase tables for efficient storage and performance, working with NoSQL databases like HBase
  • Managed data from multiple sources, maintaining HDFS and loading structured/unstructured data for diverse processing needs
  • Developed RESTful API using Python to track open-source GitHub projects and implemented machine learning methods using Spark, Python, Hadoop, and HBase
  • Designed data analysis pipelines with Python, leveraging AWS services like S3, EC2, and Elastic MapReduce for efficient processing and storage
  • Applied advanced text analytics using Apache Spark and developed analytic systems with Python and Scala-based ML Libraries.

Data Engineer

Citi
Irving, United States
05.2017 - 01.2020
  • Developed and deployed Spark applications using Scala for Hadoop transitions, implemented Microservices architecture with Spring Boot, managed cloud-based storage, and processing in AWS HDFS, and collaborated on Hadoop-based Data Lake initiatives
  • Also, streamlined data ingestion pipelines monitored Hadoop cluster operations, and migrated MapReduce programs to Spark transformations, utilizing Scala and Python for analysis and optimization, ultimately enhancing data processing efficiency across organization
  • Developed Spark applications using Scala to facilitate Hadoop transitions, enhancing performance on the Hortonworks Data Platform
  • Utilized Microservices architecture with Spring Boot-based services, building and deploying enterprise-level software products
  • Managed cloud-based storage and processing in AWS HDFS, deploying applications using ELBs and EC2 instances
  • Created Hive tables and read parquet data using Scala API, Spark, and Spark SQL for faster processing and testing
  • Analyzed SQL scripts and designed solutions by implementing Spark programs using Spark optimizing data processing tasks
  • Collaborated with the Big Data Architecture team to establish a Hadoop-based Data Lake for organization-wide analytics initiatives
  • Extracted real-time data feeds with Spark Streaming, converting them to RDDs and processing data as Data Frames in HDFS
  • Developed data pipelines for ingesting customer behavioral data into HDFS using Sqoop, Pig, and Java MapReduce
  • Aggregated log data with Apache Flume, staging data in HDFS for further analysis and processing
  • Managed Hadoop cluster operations, including installation, upgrades, capacity planning, and troubleshooting MapReduce job execution issues
  • Utilized Scala and Python for interactive and batch analysis, developing Spark jobs for efficient data processing tasks.

Education

Bachelor of Science (BS) - Statistics

The University of Texas at Austin

Associate's Degree - Computer Science

Dallas College

Skills

  • Python
  • Data Mining
  • AWS
  • Data Warehousing
  • ETL
  • Docker
  • Hadoop
  • Git
  • Hive
  • NoSQL
  • MapReduce
  • Machine Learning
  • Scala
  • Spark
  • Pytorch
  • SQL
  • Kafka
  • Azure
  • Deep Learning
  • Kubernetes
  • Linux
  • Pig
  • Java
  • Tensorflow
  • Statistical Analysis
  • Apache Storm
  • Apache Flink
  • Data Modeling
  • Data Visualization
  • Predictive Analytics
  • Data Pipelines
  • Shell Scripting

Personal Information

Title: Big Data Engineer

Timeline

Data Solutions Architect

Bank of America
07.2023 - Current

Data Engineer

CVS Health
01.2020 - 08.2023

Data Engineer

Citi
05.2017 - 01.2020

Bachelor of Science (BS) - Statistics

The University of Texas at Austin

Associate's Degree - Computer Science

Dallas College
Subhash Khanal