Mrigonav Saikia

Work History

Data Scientist

JPMorgan Chase

Plano, Texas

03.2020 - Current

• Machine Learning Model Development: Designed and deployed machine learning models to predict system vulnerabilities, optimizing the organization's resilience strategies.
• Automated Resilience Frameworks: Developed and implemented the ART framework to automate resiliency testing, enabling the simulation of real-world disruption scenarios.
• Risk Analysis & Mitigation: Conducted risk assessments using statistical analysis and predictive modeling to identify potential points of failure in enterprise systems.
• Anomaly Detection: Designed algorithms for real-time anomaly detection in system performance metrics, enhancing disaster recovery mechanisms.
• Cloud Integration: Migrated resilience models to cloud platforms like AWS and Azure, enabling real-time monitoring and analysis.
• Big Data Processing: Leveraged Hadoop and Spark for processing terabytes of data to assess enterprise-wide performance during simulated disruptions.
• Visualization Dashboards: Developed interactive dashboards in Tableau and Power BI to provide actionable insights into system resilience and risk factors.
• Disaster Recovery Optimization: Provided data-driven recommendations to improve disaster recovery plans, and reduce downtime during incidents.
• Real-Time Monitoring Solutions: Created systems for monitoring critical infrastructure health in real-time, ensuring rapid response to anomalies.
• Collaboration with Stakeholders: Partnered with IT infrastructure, risk management, and business continuity teams to align analytics solutions with organizational goals.
• Simulation Modeling: Built simulation models for testing the impact of infrastructure failures on enterprise operations, improving predictive accuracy.
Regulatory Compliance Analysis: Ensured resilience testing and analytics complied with financial industry regulations and internal governance policies.

Data Engineer

Discover

Riverwoods, USA

10.2018 - 02.2020

Company Overview: Riverwoods, IL
Worked on Snowflake Shared Technology Environment for providing stable infrastructure, secured environment, reusable generic frameworks, robust design architecture, technology expertise, best practices and automated SCBD (Secured Database Connections, Code Review, Build Process, Deployment Process) utilities
Analyzed and classified multiple biological conditions using Azure machine learning techniques PCA, PLS-DA, R-SVM and RF
Designed ETL process using Pentaho Tool to load from Sources to Targets with Transformations
Developed Pentaho Bigdata jobs to load heavy volume of data into S3 data lake and then into Redshift data warehouse
Migrated the data from Redshift data warehouse to Snowflake database
Build dimensional modelling, data vault architecture on Snowflake
Built scalable distributed Hadoop cluster running Hortonworks Data Platform (HDP 2.6)
Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's
Serializing JSON data and storing the data into tables using Spark SQL
Spark Streaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (HBase)
Worked on Spark framework on both batch and real-time data processing
Hands on experience in MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming
Developing programs for Spark streaming which takes the data from Kafka and pushes into different sources
Loading the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned
Created different pig scripts & converted them as shell command to provide aliases for common operation for project business flow
Implemented Partitioning, Bucketing in Hive for better organization of the data
Created few Hive UDF's to as well to hide or abstract complex repetitive rules
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables
Developed bash scripts to bring log files from FTP server and then processing it to load into Hive tables
All the bash scripts are scheduled using Resource Manager Scheduler
Developed Map Reduce programs for applying business rules on the data
Developed a NiFi Workflow to pick up the data from Data Lake as well as from server and send that to Kafka broker
Involved in loading and transforming large sets of structured data from router location to EDW using an Apache NiFi data pipeline flow
Implemented Kafka event log producer to produce the logs into Kafka topic which are utilized by ELK (Elastic Search, Log Stash, Kibana) stack to analyze the logs produced by the Hadoop cluster
Did Implementation using Apache Kafka replacement for a more traditional message broker (JMS Solace) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages
Implemented receiver-based approach, here I worked on Spark streaming for linking with Streaming Context using Python and handle proper closing & waiting stages as well
Experience in Implementing Rack Topology scripts to the Hadoop Cluster
Implemented the part to resolve issues related with old Hazel cast API Entry Processor
Used Akka Toolkit to perform few builds and used Akka with Scala
Excellent knowledge with Talend Administration console, Talend installation, using Context and global map variables in Talend
Used dashboard tools like Tableau
Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly basis
Riverwoods, IL

Data Engineer

Capital One

Richmond, USA

01.2017 - 09.2018

Company Overview: Richmond, VA
Developed highly optimized Spark applications to perform data cleansing, validation, transformation and summarization activities
Data pipeline consist Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data
Created Spark jobs and Hive Jobs to summarize and transform data
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
Converted Hive/SQL queries into Spark transformations using Spark DataFrames and Scala
Used different tools for data integration with different databases and Hadoop
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
Built real time data pipelines by developing Kafka producers and spark streaming applications for consuming
Leveraged Apache Airflow to orchestrate the execution of machine learning models for fraud detection and prevention
Scheduled Airflow workflows to trigger model training, validation, and deployment tasks based on predefined schedules or event-driven triggers
Integrated Airflow with model training pipelines implemented in frameworks such as TensorFlow or scikit-learn
Utilized Airflow's workflow templating and parameterization features to dynamically configure model training experiments and hyperparameters
Ingested syslog messages parse them and streams the data to Kafka
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Analyzed the data by performing Hive queries (Hive QL) to study customer behavior
Helped Devops Engineers for deploying code and debug issues
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Developed Hive scripts in Hive QL to de-normalize and aggregate the data
Scheduled and executed workflows in Oozie to run various jobs
Implemented business logic in Hive and written UDF's to process the data for analysis
Addressing the issues occurring due to the huge volume of data and transitions
Designed, documented operational problems by following standards and procedures using JIRA
Richmond, VA

ETL/Data Analyst

GGK Tech

Hyderabad, India

01.2014 - 07.2015

Company Overview: Hyderabad, India
Involved In different modules of Enterprise data warehousing, Supply chain and .com projects
Involved as a Business Analyst interacted with product and Technical Managers of the Source systems and came with Business transformation logic
Involved in Design, Development, Testing, Documentation and Implementation of the project
Involved in full life cycle for the various projects, Design, data Modelling, Requirements gathering, Unit/QA testing and production
Onsite co-coordinator - offshore team communication
Design Parallel Partitioned Ab Initio graphs using GDE Components for high volume data warehouse
Worked with continuous components and XML components
Hands on experience with Teradata Sql Assistant to interface with the Teradata
Implemented Control-M as the primary scheduling and automation tool for managing data warehouse ETL processes
Scheduled Control-M jobs to extract data from source systems, transform it according to business rules, and load it into the data warehouse
Orchestrated complex data workflows spanning multiple systems and environments using Control-M's workflow orchestration capabilities
Integrated Control-M with version control systems and deployment pipelines for seamless CI/CD (Continuous Integration/Continuous Deployment) of ETL code and configurations
Hyderabad, India

Summary

Work History

Data Scientist

Data Engineer

Data Engineer

ETL/Data Analyst

Education

M.S - Information Technology & Cybersecurity

M.S - Data Analytics

M.S - Systems & Engineering Management

B.Tech -

Skills

Languages

Certification

Work Preference

Work Type

Location Preference

Important To Me

Work Availability

Websites

Timeline

Similar Profiles

LAURA PRENTICELAURA PRENTICE

PASTELL JENKINSPASTELL JENKINS

Supraj CBSupraj CB

Noemie VillanuevaNoemie Villanueva

Joshua BookamireJoshua Bookamire