Experienced Data Engineer with over 5 years in designing, deploying, and optimizing large-scale data solutions and enterprise applications. Proven expertise in Hadoop ecosystem (HDFS, MapReduce, Hive, Sqoop, Flume), cloud-based ETL processes (AWS Glue, Azure Data Factory, IDMC/IICS), and real-time data pipelines using Spark, Kafka, and Stream Sets. Adept at creating and maintaining data lakes, data warehouses (AWS Redshift, Snowflake, Azure SQL), and data catalogs to ensure high data accuracy, quality, and accessibility. Skilled in automation through Python, Airflow, and Terraform, as well as managing both relational (Oracle, SQL Server, MySQL) and NoSQL databases (MongoDB, Cassandra, HBase). Advanced understanding of machine learning algorithms and statistical tools ( Pandas) for data analysis, and a solid background in CI/CD and DevOps practices for optimized deployment. Enthusiastic about cloud adoption, open-source data engineering, and exploring ML and automation-driven solutions to enhance data-driven decision-making. Knowledgeable with robust background in data architecture and pipeline development. Proven ability to streamline data processes and enhance data integrity through innovative solutions. Demonstrates advanced proficiency in SQL and Python, leveraging these skills to support cross-functional teams and drive data-driven decision-making.
Designed and implemented ETL processes in AWS Glue, migrating and transforming data from multiple sources (e.g., S3 and text files) into AWS Redshift, improving data accuracy by 30% and reducing reporting time by 20%
Designed a data lake workflow in the Hadoop ecosystem, enabling seamless integration with Tableau for reporting, improving data accessibility for stakeholders by 40% and enhancing reporting efficiency by 30%
Microsoft Azure Fundamentals