Overall 10+ years of professional IT experience and over 5 years of Big Data Ecosystem experience in ingestion, storage, querying, processing, analysis of big data/ Databricks/cloud Technologies (AWS, Azure, GCP) Hands-on experience in Azure Cloud Services (PaaS & IaaS), Azure Databricks, Azure Synapse Analytics, Azure Cosmos DB, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure HDInsight, Key Vault, Azure Data Lake for data ingestion, ETL process, data integration, data migration, AI solutions. Ingested data into Azure Blob storage and processed the data using Databricks, Involved in writing Spark scripts and UDF's to perform transformations on large dataset. Experience working with Azure Blob and Data Lake storage and loading data into Azure SQL Synapse analytics. Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration. Created Databricks notebooks to streamline and curate the data for various business use cases and mounted blob storage on Databricks. Experience in building data pipelines, computing large volumes of data using Azure Data factory. Developed Python scripts to do file validations in Databricks and automated the process using ADF. In-depth knowledge of Hadoop and Spark, experience with data mining and stream processing technologies (Kafka, Spark Streaming) Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL and HDFS, parallel processing - MapReduce framework. Development of Spark-based application to load streaming data with low latency, using Kafka and Pyspark programming. Extensive hands-on experience tuning spark Jobs. Experienced in working with structured data using HiveQL and optimizing Hive queries. Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing, and analysis of data. Experience in development of Big Data projects using Hadoop, Hive, Flume, and MapReduce open-source tools. Experience in installation, configuration, supporting and managing Hadoop clusters. Experience in working with MapReduce programs using Apache Hadoop for working with Big Data. Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite. Worked with BTEQ in UNIX environment and execute the TPT script from UNIX platform. Worked on Teradata Store Procedures and functions to confirm the data and load it on the table. Worked on Teradata Multi-Load, Teradata Fast-Load utility to load data from Oracle and SQL Server to Teradata. Wrote numerous BTEQ Scripts to run complex queries on the Teradata database. Tuning SQL queries to overcome spool space errors and improve performance. Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS. Strong hands-on experience with AWS services, including EMR, S3, EC2, Lambda, Glue, Redshift, Athena, DyanmoDB. Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies. Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Hive, Sqoop, Oozie, Flume, big data technologies. Worked on Spark, Spark Streaming and using CoreSparkAPI to explore Spark features to build data pipelines. Experienced in working with different scripting technologies like Python and UNIX shell scripts. Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop. Installed and configured Apache airflow for workflow management and created workflows in python. Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool. I have experience in database design, entity relationships and database analysis, programming SQL, stored procedures PL/SQL, packages, and triggers in Oracle. Experience in working with different data sources like Flat files, XML files and Databases. Hands on experience in working with Continuous Integration and Deployment (CI/CD) Strong communication skills, analytic skills, good team player and quick learner, organized and self-motivated.
Transitioning from data-centric environment with focus on developing efficient data solutions and optimizing workflows. Skilled in data architecture, database management, SQL, and Python, with track record of enhancing data-driven decision-making processes. Seeking to apply these transferrable skills in new field, bringing consultative approach to solving complex problems and improving operational efficiency.
Python
Microsoft Certified, Azure Data Engineer Associate - Microsoft.