
With over 11+ years of experience as a Data Engineer, I specialize in implementing comprehensive Big Data solutions and architecting Hadoop ecosystems. My expertise spans a wide range of technologies including Apache Hadoop components like HDFS, MapReduce, YARN, Hive, Sqoop, and HBase, with a strong command of Spark for both ETL processes and real-time data processing. My technical background includes optimizing performance in Hive, troubleshooting complex queries, and managing large-scale data environments in both AWS (EMR, S3, Redshift) and Azure (HDInsight, Databricks, Data Lake). I have successfully leveraged tools like Kafka for real-time data pipelines and Glue for automated ETL processes, alongside managing enterprise data lakes for structured and unstructured data analysis. I excel in working with RDDs, DataFrames, and Dataset APIs, while also utilizing cloud-native services such as AWS Glue and Athena for streamlined data extraction and transformation. My skill set includes NoSQL databases such as HBase and Cassandra, enabling seamless integration within Hadoop clusters, and my experience extends to columnar file formats like Avro, ORC, and Parquet. I have worked with various Hadoop distributions (Cloudera, Hortonworks) and utilized scheduling tools like Oozie and Airflow to automate workflows efficiently. In addition to Big Data, I have a solid foundation in core programming languages including Java, and JavaScript, which has equipped me to design scalable, secure, and efficient RESTful APIs. I have led several data integration efforts, working within Agile environments using JIRA and Confluence, and possess strong data modeling skills for both OLTP and OLAP systems, with experience in SQL, PySpark, and backend database analysis. This well-rounded expertise enables me to drive high-performance data engineering initiatives, from cloud data migration to advanced analytics and data pipeline optimization.