Dynamic Data Engineer with hands-on experience at Amigos Software Solutions, specializing in designing optimized ETL pipelines and real-time data ingestion using Hadoop and Kafka. Proven ability to enhance data quality through rigorous validation, while collaborating with cross-functional teams. Strong problem-solving skills and expertise in Python and SQL for data manipulation and process optimization. I possess extensive experience in big data technologies, including Hadoop, Kafka, Sqoop, and Oozie, which I have used for large-scale data processing and real-time streaming. I am proficient in working with HDFS, MapReduce, and Hive, ensuring efficient data storage and querying within big data environments. In data engineering and ETL processes, I have designed and optimized ETL pipelines using tools like Informatica, IBM InfoSphere DataStage, Python, and SQL, ensuring smooth data transformation and integration across various systems.
I have hands-on experience with leading cloud technologies such as AWS (including services like S3, Redshift, and EC2) and GCP, which I’ve leveraged to provide scalable data storage, processing, and analytics solutions. My expertise also extends to data storage and management, where I have worked extensively with both SQL and NoSQL databases, including Oracle, DB2, MongoDB, and Netezza, optimizing data management processes and ensuring efficient data processing.
I am highly skilled in data validation and quality assurance, utilizing complex SQL queries to validate and troubleshoot data quality issues, ensuring the integrity and accuracy of data across various environments. In programming and scripting, I am proficient in Python and SQL, using these languages to manipulate data, automate workflows, and develop seamless data pipelines.
My experience in data visualization includes using Tableau to visualize complex datasets, helping to provide actionable insights through intuitive dashboards and reports. I also have experience in DevOps and automation, having worked with Git, Jenkins, and CI/CD pipelines to automate workflows, manage version control, and streamline data operations.
I excel at performance optimization, focusing on improving the efficiency, speed, and reliability of data systems, while ensuring cost-efficiency in cloud data processing. I am committed to continuous learning and innovation, always staying up-to-date with emerging technologies in data engineering, and I am passionate about improving processes through the application of cloud technologies and big data frameworks.