
Experienced Big Data Engineer with expertise in advanced Apache Spark processing, including Scala, Python, and Java. Proficient in Spark Streaming, Data Frame API, and Spark SQL, handling large-scale data processing. Strong background in SQL performance tuning using Hive/Impala, and dashboarding with Elasticsearch and Kibana. Skilled in integrating Spark streaming jobs with Apache Kafka and AWS Kinesis. Expertise in building automated ETL pipelines with a focus on data flow, error handling, and recovery. Knowledgeable in setting up and tuning Spark clusters using Yarn, Mesos, and standalone environments. Experienced in AWS services such as EMR, Glue, S3, Athena, and Lambda, with a solid understanding of data warehousing, physical table design, and job scheduling tools like Airflow and AWS Data Pipeline.
Competencies:
Hadoop Cluster Setup, Hive, Pig, Sqoop, Oozie, Spark, Data Transformation, Data Analysis, Data Visualization, ETL Processes, Data Pipelines, SQL Performance Tuning, Real-time Data Processing, Workflow Automation, Big Data Analytics, Cloudera Manager
Programming Tools & Languages:
Tableau, Power BI, Python Visualizations, Excel Dashboard, T-SQL, Java, Scala, PL/SQL, SQL, C, C, XML, HTTP, MATLAB, DAX, Python, R, SAS E-miner, SAS, SQL Server, MS-Access, Oracle, Teradata, Cassandra, Neo4j, MongoDB, Git, GitHub, Anaconda Navigator, Jupyter Notebook, Azure Data Factory, Azure Databricks, Azure Analysis Services, Looker, Smart View, Nexus