A self - motivated graduated student with a Masters degree in Data Science from UMBC with an industry experience as a Software Engineer. Specialized in increasing efficiency, accuracy, and utility of internal data processing. Vast exposure in creating data regression models, using predictive data modeling, and analyzing data mining algorithms to deliver insights and implement action-oriented solutions to complex business problems.
• Involved in the Design and Architecture of the complete application using PySpark and Snowflake.
• Developed the main module data processing using PySpark, integrating it with other modules stored in Snowflake.
• Tested and delivered the application with high consistency and reliability, ensuring efficient data pipelines and transformation processes.
• Enhanced the application based on market changes and handled high-volume production data with optimized PySpark operations and Snowflake performance tuning.
• Created Databricks Workflow to automate PySpark jobs and Snowflake queries.
• Designed and maintained metadata for Snowflake tables and schemas to support data lineage and governance.
• Analyzed and transformed data using PySpark and Snowflake SQL managed and monitored Spark job logs and Snowflake query history for performance insights and troubleshooting.
Data Migration Lead – Hive to BigQuery Migration Project:
• Led the migration of data processing scripts from Hive HQL to BigQuery SQL, ensuring compatibility with GCP’s serverless environment.
• Analyzed existing Hive scripts to understand data transformation logic, identifying any Hive-specific functions and converting them to equivalent BigQuery SQL functions.
• Developed and optimized BigQuery SQL queries to replicate Hive transformations while improving performance and leveraging BigQuery’s features, such as partitioning and clustering.
• Utilized Dataflow for migrating data pipelines and transforming data, allowing a seamless transition from Hive to BigQuery.
• Created Airflow DAGs to schedule and automate data loading jobs from GCS to BigQuery, ensuring smooth data processing workflows.
• Ensured data quality and integrity by implementing testing frameworks to compare output data between Hive and BigQuery environments.
• Documented the end-to-end migration process, including best practices, limitations, and optimizations, to support ongoing maintenance and scalability of BigQuery operations.
• Collaborated with cross-functional teams to train team members on BigQuery, facilitating knowledge transfer and adapting to the GCP environment.