
Accomplished Data Engineer with 2.9 years of IT experience, specializing in data modelling, data governance, and cloud-based Big Data solutions. Proficient in Azure Data Factory, Databricks, Unity Catalog, and PySpark, with a strong foundation in Delta Lake architecture and SQL. Adept at designing scalable, secure data models, optimizing ETL processes, and enabling seamless integration for downstream analytics and reporting.
● Developed the Unix script for getting the Input files from NFS mount. Developed the data model for Hive tables.
● Creation of Hive External table for new Business Implementations.
● Applying the Spark transformations and ensuring the data do not contain without special symbols.
● Designed comprehensive data models (Star and Snowflake schemas) to optimize reporting performance.
● Developed and deployed PySpark-based data pipelines to process encrypted banking data into actionable formats.
● Automated file ingestion processes and integrated them with Unity Catalog for real-time metadata tracking.
● Created and managed Hive external tables for seamless data integration and query performance improvement.
● Tuned Spark transformations using partitioning and broadcast joins, achieving a 30% reduction in processing times.
● Ensured compliance with data security standards by implementing robust access controls and auditing frameworks. ● Ensure data quality, integrity, and security throughout the data lifecycle, implementing data governance policies and compliance standards.
● Collaborate with data analysts and business stakeholders to understand their data needs and develop data models and solutions to meet those needs.
● Azure Synapse analytics, azure data bricks, azure sql data warehouse, azure data lake storage, sql, etl
● Developed the Unix script for getting the Input files from NFS mount Developed the data model for Hive tables
● Creation of Hive External table for new Business Implementations
● Applying the Spark transformations and making sure the data have without special symbols
● Designed comprehensive data models (Star and Snowflake schemas) to optimize reporting performance
● Developed and deployed PySpark-based data pipelines to process encrypted banking data into actionable formats
● Automated file ingestion processes and integrated them with Unity Catalog for real-time metadata tracking
● Created and managed Hive external tables for seamless data integration and query performance improvement
● Tuned Spark transformations using partitioning and broadcast joins, achieving a 30% reduction in processing times
● Ensured compliance with data security standards by implementing robust access controls and auditing frameworks
● Ensure data quality, integrity, and security throughout the data lifecycle, implementing data governance policies and compliance standards
● Collaborate with data analysts and business stakeholders to understand their data needs and develop data models and solutions to meet those needs
● Azure Synapse analytics, azure data bricks, azure sql data warehouse, azure data lake storage, sql, etl