· Implemented and managed data governance policies to ensure data accuracy, consistency, and integrity within Hadoop clusters.
· Collaborated with data engineers to design and implement ETL processes for efficient data ingestion, transformation, and loading into Hadoop clusters.
· Utilized workflow orchestration tools such as Apache Airflow to schedule and automate data processing workflows.
· Conducted capacity planning to anticipate resource requirements and scale Hadoop clusters accordingly, ensuring optimal performance.
· Worked closely with data scientists to understand analytical requirements and support the deployment of machine learning models on Hadoop clusters.
· Implemented data catalog solutions to manage metadata and facilitate data discovery, enabling metadata-driven data governance practices.
· Stayed abreast of industry best practices and emerging technologies beyond Hadoop, such as Apache Flink or Apache Beam, to enhance data processing capabilities.
· Provided training and documentation to educate users on self-service capabilities within the Hadoop ecosystem, promoting efficient utilization of resources.
. Demonstrated 5+ years of experience in technical support and consulting, with a strong foundation in IT and Big Data solutions, including extensive hands-on experience with Microsoft SQL Server, Azure Data Factory, and Hadoop ecosystems.
. Proficient in cloud technologies and services, particularly Azure HDInsight, Azure Databricks, and Azure Cosmos DB, showcasing a deep understanding of NoSQL services and MongoDB, as well as expertise in Data Lake and cloud streaming technologies.
· Designed, implemented, and managed Hadoop infrastructure and data ecosystem, ensuring reliability and performance.
· Collaborated with cross-functional teams to optimize data pipelines and meet data requirements.
· Administered and monitored Hadoop clusters, troubleshooting issues and implementing security measures.
· Implemented security measures including authentication, authorization, and encryption within Hadoop clusters.
· Collaborated on defining and implementing backup and disaster recovery strategies for Hadoop clusters.
· Optimized Hadoop performance through configuration fine-tuning and capacity planning.
· Worked with DevOps teams to automate Hadoop infrastructure provisioning and management processes.
· Stayed updated with Hadoop ecosystem developments and recommended new technologies to enhance the platform.
· Documented Hadoop infrastructure configurations, processes, and best practices.
Provided technical support and guidance to team members and stakeholders.
. Expertise in the Open Source ecosystem, including Linux and Apache, enhancing the ability to deploy, manage, and troubleshoot complex data solutions in a variety of environments.
. Strong BI background with substantial experience in ETL processes, data warehousing management, data mining, and the development of reporting solutions, underpinned by a thorough grasp of data querying and manipulation.
· Integrated machine learning and deep learning algorithms using Python.
· Developed Unix shell scripts for high-level automation of executing HQL files and transferring files to client servers.
· Designed and implemented tooling for fully automating deployment and configuration of Cloudera Hadoop clusters.
· Created workflow orchestration using Azkaban and shell scripting.
Provided training and documentation to educate users on self-service capabilities within the Hadoop ecosystem, promoting efficient utilization of resources.
· Implemented backup and restore functionality to safeguard data stored within Hadoop clusters and ensure business continuity in the event of system failures.
· Collaborated with security teams to implement comprehensive security measures, including role-based access control (RBAC) and encryption, to protect sensitive data within Hadoop clusters.
. Adept at problem solving and troubleshooting, with a proven track record of using diverse data collection tools and methodologies to analyze problems, identify root causes, and devise effective solutions.
. Committed to continuous learning and skill enhancement, particularly in areas such as Azure Cosmos DB fundamentals and Microsoft Azure Platform Services, ensuring the delivery of high-quality, timely technical expertise to address business-critical challenges.
· Automated patch upgrades for Java versions using shell scripts.
· Built and deployed data ingestion to data lake using HDFS, Apache Sqoop, and Apache Hive.
· Analyzed Azkaban failed flow logs and application logs for root cause analysis.
· Orchestrated workflows and data pipelines using Azkaban and shell scripting.
· Created and maintained CI/CD pipelines using Jenkins, Rundeck, and Ansible.
· Automated infrastructure patch cycles and OS installations using Ansible.
· Developed PowerShell scripts for security audits from Active Directory.
· Automated client deployments using Jenkins and Teamcity.
Managed Kubernetes clusters for microservices.
Snowflake Snowpro
https://www.credly.com/badges/d6aa0615-991e-4ec9-9e2b-c82e3d9ee50b/public_url
Microsoft Certified: Azure Administrator Associate
Microsoft Certified: Azure Solutions Architect Expert