Experienced in optimizing data pipelines and applications for improved performance and scalability, including configuring and installing Hadoop/Spark ecosystem components.
Developed and automated Databricks notebooks using SQL and Python, configuring Azure Databricks clusters for high concurrency, ensuring rapid preparation of high-quality data.
Environment: Hadoop, Spark, Spark Streaming, MapReduce, Hive, Pig, Oozie, Kafka, Storm, Scala, Java, Python, Sqoop, Talend, AWS (EMR, S3, CloudWatch), MongoDB, Solr, Hadoop Cluster, Azure Databricks, Linux.
Environment: Hadoop, Hive, Spark, MapReduce, HBase, Kafka, Flume, Azure Databricks, AWS, Azure Data Lake, Azure SQL, Azure DW, Scala, Python, Java, Shell Scripting
HDFS, ADLS Gen-2, Cassandra, Teradata, NoSQL, Sqoop, Ambari, Azure Data Factory, Tableau, Ubuntu, Oracle 10g/11g/12C
Environment: Spark, PySpark, AWS, S3, Glue, Redshift, DynamoDB, Hive, Spark SQL, Docker, Kubernetes, Airflow, GCP, ETL workflows.
Category
Skills
Big Data Technologies : HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Spark, Kafka, Nifi, Airflow, Flume, Snowflake, Ambari, Hue
Hadoop Frameworks : Cloudera CDHs, Hortonworks HDPs, MAPR, Spark, Impala
Cloud Services : AWS (IAH, S3, EMR, EC2, Lambda, Route 53, Cloud Watch, SNS), Azure, GCP
Programming Languages : SQL, Python, Scala, Java, C, C
Databases : Oracle (10g/11g), PL/SQL, MySQL, MS SQL Server 2012, DB2, Teradata, NoSQL, HBase, Cassandra, MongoDB, DynamoDB
Development Tools : Eclipse, Net Beans, IntelliJ, PyCharm, Jupyter, Databricks notebooks
Data Formats : JSON, Parquet, AVRO, XML, CSV
Business Intelligence : Tableau, PowerBI, DataStudio
Modeling Tools : Rational Rose, Star UML, Visual Paradigm for UML
Build Tools : Maven, Gradle, Jenkins
Operating Systems : Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
Web Technologies : JDBC, JSP, Servlets, Struts (Tomcat, JBoss)
Methodologies : Agile, Waterfall