Mlflow Deletion Explained
MLFlow is a popular open-source library for machine learning model lifecycle management. I’ve primarily used it used as a model and experiment registry, although it has other functionalities that support model deployment (there are much better solutions out there for this) and LLM workflows (don’t like the way the project is headed here, but I digress).
As a single place to track, version, and package ML artifacts, it does the job… mostly. My current beef with MLFlow is that it has one of the dumbest ways of handling artifact deletion that I’ve seen. If you’ve ever wondered “how can I permanently delete something from MLFlow,” this post will detail the process involved.
MLFlow architecture summary
Skip this section if you already know how MLFlow works.
MLFlow has 3 core entities that matter here: runs, experiments, and registered models.
- A run is a single execution which logs parameters, metrics, tags and artifacts associated with a trained model.
- An experiment is a grouping of runs.
- A registered model is a named reference (a pointer) to a model artifact produced by a run. MLFlow lets you version, stage, and describe the reference, but the underlying data always belongs to a run.
When you log a run, MLFlow stores data in 2 places:
- Backend store (database) - this holds metadata: experiment names, run IDs, parameters, metrics, tags, etc.
- Artifact store (blob storage) - this holds the actual files corresponding to the run.
These two layers are managed by the tracking server, which is the API/UI layer that you interact with. When you instantiate an MlflowClient(), you are interacting with the tracking server. You don’t talk to the DB or blob storage directly. This split, and the fact that registered models are pointers, is what makes deletion behave differently depending on what you are deleting.
Entity “deletion” is a soft-delete, except for registered models
Experiments and runs
When you delete an experiment or run, whether through the API, CLI, or UI, MLFlow does not actually remove anything. It flips a lifecycle_stage field in the database from active to deleted, preventing the experiment/run from being returned from results. Both the backend entities and the artifacts still exist.
import mlflow
# These are all soft deletes; nothing is actually removed...
mlflow.delete_experiment(experiment_id="1")
mlflow.delete_run(run_id="abc123")
# ...which means they're reversible!
mlflow.restore_experiment(experiment_id="1")
mlflow.restore_run(run_id="abc123")
This leads to annoying behaviour, such as not being able to name a new experiment after a previously “deleted” experiment. You might say that a soft-delete is preferable in case of accidental deletion. I’d be fine with soft-deletion as a default behaviour if there was some way to hard-delete through tracking server. Plus, a retention policy can just be set on blob storage to prevent any catastrophic deletions from happening; I’d much rather handle things at that level instead of having this come from MLFlow.
Registered models
As explained above, registered models are actually pointers to a run with additional metadata attached. Deleting a registered model will actually delete the metadata from the registered model, but the associated run will persist.
Hard-deletes cannot be targeted
In order to hard delete experiments and runs, you need to use the MLflow CLI’s mlflow gc command with direct access to the backend database - there’s no way to do this through the tracking server.
One thing to note is that mlflow gc has two independent cleanup targets: the backend store (database) and the artifact store (blob storage). If you omit the --artifacts-destination flag, it will only purge metadata from the database, leaving the actual files in blob storage untouched. To fully clean up, you need to pass both:
# Purges metadata only - artifacts will remain in blob storage
mlflow gc --backend-store-uri postgresql://user:pass@host:5432/mlflow
# Purges metadata and artifacts
EXPORT MLFLOW_TRACKING_URI=localhost:8000
mlflow gc \
--backend-store-uri postgresql://user:pass@host:5432/mlflow \
--artifacts-destination s3://your-mlflow-bucket
The MLFlow docs specify that you can specify run_ids and experiment_ids in order to delete specific runs/experiments:
# Delete specific runs by ID (they must be in deleted state)
mlflow gc --run-ids 'run1,run2,run3'
# Delete all runs in specific experiments (experiments must be in deleted state)
mlflow gc --experiment-ids 'exp1,exp2'
You would think that this means that command #1 performs a targeted hard-delete on runs 1-3, but this is not the case. Instead, it deletes everything with lifecycle_stage=deleted, in addition to the run_ids specified. This seems like a clear bug to me since those run_ids would have already have been including in the bulk garbage collection, but according to MLFlow this is the desired behaviour.
The consequence of this is that you can’t actually target hard-deletion through If you need targeted hard deletion of a single experiment or run, you’d have to bypass MLflow entirely and run delete commands directly against the database and blob storage yourself. I’m too lazy for this, so in production I just instead treat mlflow gc as a scheduled cleanup job with a retention window:
# Purge anything soft-deleted more than 30 days ago
EXPORT MLFLOW_TRACKING_URI=localhost:8000
mlflow gc \
--backend-store-uri postgresql://user:pass@host:5432/mlflow \
--artifacts-destination s3://your-mlflow-bucket \
--older-than 30d
tldr; how to actually hard-delete delete things in MLFlow
Registered models
Just delete these. Since they’re only pointers, deletion through the API/UI is a real delete.
from mlflow import MlflowClient
client = MlflowClient()
client.delete_registered_model(name="fraud-detector")
Experiments and runs
A two-step process:
- Soft delete through the API, CLI, or UI
- Run
mlflow gcwith both the backend store and artifact destination to fully clean up
# 1: soft delete
mlflow.delete_experiment(experiment_id="1")
# 2: bulk hard delete
EXPORT MLFLOW_TRACKING_URI=localhost:8000
mlflow gc \
--backend-store-uri postgresql://user:pass@host:5432/mlflow \
--artifacts-destination s3://your-mlflow-bucket \
--older-than 30d