Dynamic and motivated IT professional with around 11 years of experience as a Data Engineer with expertise in designing data-intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data Engineering, Data Warehouse / Data Mart, Data Visualization, Reporting, and Data Quality solutions. In-depth knowledge of Hadoop architecture and its components like YARN, HDFS, Name Node, Data Node, Job Tracker, Application Master, Resource Manager, Task Tracker, and Map Reduce programming paradigm. Extensive experience in Hadoop-led development of enterprise-level solutions utilizing Hadoop components such as Apache Spark, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, and YARN. Profound experience in performing Data Ingestion, Data Processing (Transformations, enrichment, and aggregations). Strong Knowledge of the Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programming paradigm and Spark execution framework. Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Spark Streaming, and Pair RDD, and worked explicitly on PySpark and Scala. Handled ingestion of data from different data sources into HDFS using Sqoop, and Flume and perform transformations using Hive, Map Reduce, and then loaded data into HDFS. Managed Sqoop jobs with incremental load to populate HIVE external tables. Experience in importing streaming data into HDFS using Flume sources, and Flume sinks and transforming the data using Flume interceptors. Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows. Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards. Utilized the Azure Paas service, analyze, plan, and develop modern data solutions that facilitate data visualization. Recognize the application's current state in production and assess how a new implementation will affect the current business procedures. Using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL, extract, transform, and load data from source systems into Azure Data Storage services Analytics for Azure Data Lake. Data is processed in Azure Databricks after being ingested into one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, and Azure DW). Creating serverless yml files to AWS resources. Experience with Partitions, and bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Experience with different file formats like Avro, parquet, ORC, JSON, and XML. Expertise in Creating, Debugging, Scheduling, and Monitoring jobs using control-M and Oozie. Hands-on experience in handling database issues and connections with SQL and NoSQL databases such as MongoDB, HBase, Cassandra, SQL Server, and PostgreSQL. Created Java apps to handle data in MongoDB and HBase. Used Phoenix to create SQL layer on HBase. Experience in developing enterprise level solution using batch processing (using Apache Pig) and streaming framework (using Spark Streaming, apache Kafka & Apache Flink). Migrated Database from SQL Databases (Oracle and SQL Server) to NO SQL Databases (Cassandra/MONGODB) Experience in designing and creating RDBMS Tables, Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers, and Transactions. Configured alerting rules and set up pagerduty alerting for Kafka, Zookeeper, Druid, Cassandra, Spark and different microservices in grafana. Set up and maintained Logging and Monitoring subsystems using tools loke; Elasticsearch, Fluentd, Kibana, Prometheus, Grafana and Alertmanager. Expert in designing ETL data flows using creating mappings/workflows to extract data from SQL Server and Data Migration and Transformation from Oracle/Access/Excel Sheets using SQL Server SSIS. Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML. Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR, and other services of the AWS family. Created and configured new batch job in Denodo scheduler with email notification capabilities Implemented Cluster setting for multiple Denodo nodes and created load balance for improving performance activity. Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, CFT, and Ansible. Experienced with JSON-based RESTful web services, and XML/QML-based SOAP web services and worked on various applications using python integrated IDEs like Sublime Text and PyCharm. Building and productionizing predictive models on large datasets by utilizing advanced statistical modeling, machine learning, or other data mining techniques. Developed intricate algorithms based on deep-dive statistical analysis and predictive data modeling that were used to deepen relationships, strengthen longevity, and personalize interactions with customers
Databricks certified data engineer professional
snowpro advanced data engineer
AWS certified data engineer
certified azure data engineer associate