Over 4 years of hands-on experience in data engineering, focusing on designing and optimizing complex data pipelines.
Proficient in ETL processes, data integration, and real-time data processing using tools like Apache Spark, Kafka, and AWS Glue.
Expertise in cloud platforms including AWS and Azure, with a strong background in deploying scalable data solutions.
Skilled in working with Big Data technologies such as Hadoop, Hive, and MapReduce to manage and process large datasets.
Extensive experience in SQL and Python for data manipulation, analysis, and building data-driven solutions.
Strong knowledge of data warehousing concepts, with hands-on experience in Redshift, Snowflake, and similar technologies.
Adept at building CI/CD pipelines using Jenkins, Docker, and Kubernetes, ensuring efficient and reliable deployment processes.
Proficient in using Control-M for batch processing and job scheduling, optimizing workflow automation.
Experienced in Unix/Linux environments, leveraging shell scripting for automation and system management.
Familiar with machine learning frameworks like Keras and TensorFlow, integrating ML models into data pipelines.
Strong problem-solving skills with a focus on improving data processing efficiency and reducing operational costs.
Excellent collaboration and communication skills, working effectively with cross-functional teams to deliver data-driven insights.
Overview
4
4
years of professional experience
Work History
Data Engineer
Meditab Software Inc.
03.2023 - 07.2024
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability
Collaborated with cross-functional teams to implement CI/CD pipelines using Jenkins and Terraform, streamlining deployment process by 30%
Built and optimized Snowflake data models, enhancing data warehousing efficiency and improving query performance by 15%
Developed Spark applications using PySpark and optimized processing efficiency, resulting in 20% reduction in processing time for large datasets
Orchestrated end-to-end AWS-based data pipelines, integrating data from various sources into Amazon S3, and performed data transformations using AWS Glue
Research Assistant
University Of Pittsburgh Joseph M. Katz Graduate School Of Business
09.2022 - 12.2023
Organized research materials, maintaining well-ordered workspace conducive to productivity.
Designed ML algorithms (TensorFlow, PyTorch, Bayesian Networks) for breast cancer severity assessment, achieving 95% accuracy and aiding early diagnosis
Streamlined ETL processes with Hive and designed scalable data pipelines using Apache Spark, boosting data processing efficiency by 30% and real-time analytics capabilities by 25%, enabling more informed business decisions
Integrated Linear Regression for sales forecasting and K-Means clustering for customer segmentation into marketing, sales, and customer service workflows, boosting departmental efficiency and overall operational effectiveness by 15%
Computer Vision Engineer
Human Engineering Research Laboratories
05.2023 - 09.2023
Enhanced YOLOv3 for real-time object detection to 99.93% accuracy, reducing navigation errors by 50%. Integrated ORB-SLAM for mapping and A* and RRT path planning algorithms, improving task completion time by 20%
Refined Random Forest models to boost precision in predicting user behaviors, achieving 30% accuracy gain and 25% efficiency improvement over prior algorithms, enhancing insights into user preferences and engagement patterns
Developed and integrated collaborative filtering and matrix factorization algorithms, increasing customer engagement by 18% and boosting sales by 15% through tailored recommendations
AI Developer
Dosepacker
04.2020 - 07.2022
Proficiency in configuring, working with big data technologies achieving a 20% progress in performance using Hadoop, Sqoop, Spark, Hive and HBase on AWS EMR clusters
Assisted in the migration of legacy data systems to AWS Redshift, reducing infrastructure costs by 15%
Analyzed large datasets to identify trends and patterns in customer behaviors.
Implemented data pipelines with Apache Airflow for scheduling and monitoring ETL workflows, ensuring data accuracy and timeliness
Applied Snowflake features like Snow Pipes and stages to process 100GB data every day in real- time and ingest the data into Snowflake tables
Employed Apache Spark with Random Forest and Gradient Boosting to enhance model accuracy. Optimized data processing with AWS Glue, enhancing efficiency. Enhanced sales forecasting, inventory management, and customer behavior analysis, boosting supply chain productivity
Developed targeted marketing campaigns using K-means clustering for advanced customer segmentation. Launched to optimize campaign effectiveness and personalize customer interactions, achieving a 20% boost in ROI