Postgraduate research project

Risk management for robust Machine Learning Operations (MLOps)

Funding
Competition funded View fees and funding
Type of degree
Doctor of Philosophy
Entry requirements
2:1 honours degree View full entry requirements
Faculty graduate school
Faculty of Engineering and Physical Sciences
Closing date

About the project

MLOps refers to the processes for developing and maintaining machine learning systems. Robust MLOps ensure reliability under uncertainty, adversarial conditions, and distributional shifts. Given the massive growth of ML-based projects across scientific fields, effectively managing risks to MLOps robustness has become non-negotiable.

The growing adoption of Machine Learning Operations (MLOps) frameworks for deploying ML systems in production has introduced complex, dynamic, and interconnected software supply chains. In this context, robustness, that is, the ability of systems to perform reliably under uncertainty, adversarial conditions, and distributional shifts, has become a defining criterion for trustworthy AI. As stated by Bayram and Ahmed (2025), building robust MLOps systems requires embedding robustness considerations across all processes:

  • automation of operations (validation, versioning, monitoring, updates)
  • DataOps (data cleaning, addressing distribution shifts and data scarcity, resource scheduling)
  • ModelOps (hyperparameter optimisation, model generalisability, coping with concept drift and label noise).

However, current MLOps practices lack tools and methodologies to systematically evaluate and manage risks to these processes, leaving production systems fragile, opaque, and difficult to assure throughout their life cycle. These limitations not only undermine trust and safety but also lead to operational inefficiencies, technical debt, and compliance challenges—particularly in high-stakes sectors such as healthcare, finance, and autonomous systems.

This project addresses the gap in assessing risk to MLOps robustness. It aims to operationalise robustness in MLOps through a risk-oriented framework, guided by the following research questions:

  • how can the robustness of MLOps pipelines be systematically assessed?
  • how can risks to the robustness of MLOps pipelines be systematically managed?

Several AI and ML modules available for training.