Logging a Random Forest model with MLflow

After training and tuning a model, it is essential to track experiments and save artifacts in a reproducible way. MLflow is an open-source platform that manages the full ML lifecycle: experiment tracking, model logging and deployment.

In this section, we will use MLflow to log the pre-trained RF model built in the dedicated section, along with its configuration tags, hyperparameters, evaluation metrics, diagnostic figures and the model artifact itself.

Key MLflow concepts you will use in this exercise are:

The MLflow tracking user interface (UI) lets you compare runs, visualize metrics over time and retrieve any logged artifact. All the logging calls in this exercise are made inside a single with mlflow.start_run() block, which automatically closes the run when the block exits.

Don’t forget to search in MLFlow documentation for the functions details you’re looking for.

1 Exercice 7: Log your RF model with MLflow

In this exercise, you will log the pre-trained RF model step by step, following the structure of the main training script. By the end, every artifact produced during training — figures, parameters, metrics and the model itself — will be registered in MLflow.

  1. Before starting, make sure that the MLflow tracking URI is set and that the experiment exists.
See the solution
import mlflow
import os

mlflow_server = os.getenv("MLFLOW_TRACKING_URI") # your environment feature for accessing to MLFlow server
mlflow.set_tracking_uri(mlflow_server)
mlflow.set_experiment("run_rf")
  1. Open a new MLflow run using the run run_rf.

Use mlflow.start_run() as a context manager (with block). All subsequent logging calls must be placed inside this block.

See the solution
with mlflow.start_run(run_name="run_rf"):
    ...  # all logging calls go here

Note : When testing MLflow logging with the following instructions (from 3. to 10.), you must run the code inside this block. Otherwise, the logging will not work.

  1. Log tags to describe the context of the run: model type, task, data paths, target column, framework.

Use mlflow.set_tags() with a dictionary. Tags are free-form strings and are useful for filtering runs in the UI. You can pass any key-value pairs that describe the run context.

See the solution
    mlflow.set_tags({
        "model_type": "RandomForestRegressor",
        "task": "regression",
        "data_path": "s3://...",
        "target_col": y.name,
        "framework": "scikit-learn"
    })
  1. Log the training configuration parameters: dataset split ratio, model hyperparameters, dataset size and statistics.

Use mlflow.log_params() with a dictionary. Parameters are meant for scalar values that configure the run (integers, floats, strings). Here you should log: test_size, max_depth, min_samples_split, random_state, n_train, n_test, n_features, target_mean, target_std, oob_score, n_jobs.

See the solution
    mlflow.log_params({
        "test_size": 0.2,
        "max_depth": 5,
        "min_samples_split": 2,
        "random_state": RANDOM_STATE,
        "n_train": len(X_train),
        "n_test": len(X_test),
        "n_features": X.shape[1],
        "target_mean": round(float(y.mean()), 4),
        "target_std": round(float(y.std()), 4),
        "oob_score": True,
        "n_jobs": -1
    })
  1. Log the OOB convergence figure produced in Exercise 5, which shows how the OOB error evolves as the number of trees increases.

Use mlflow.log_figure(), which takes a matplotlib figure object and a filename (used as the artifact path in MLflow). The figure was returned by the rf_error_oob_plot() function called in the previous exercise.

See the solution
    mlflow.log_figure(oob_error_ntrees, "convergence_ntrees_oob_error.png")
  1. Log the best hyperparameters found by the cross-validation grid search.

The GridSearchCV object exposes the best hyperparameter combination via grid_search.best_params_, which is already a dictionary compatible with mlflow.log_params().

See the solution
    mlflow.log_params(grid_search.best_params_)
  1. Compute and log the evaluation metrics on the test set: RMSE, MAE and R².

First generate predictions with rf_best.predict(X_test), then compute the three metrics using root_mean_squared_error, mean_absolute_error and r2_score from sklearn.metrics. Build a dictionary and pass it on to mlflow.log_metrics().

See the solution
    from sklearn.metrics import root_mean_squared_error, r2_score, mean_absolute_error

    y_pred = rf_best.predict(X_test)
    residuals = y_test - y_pred

    metrics = {
        "neg_root_mean_squared_error": root_mean_squared_error(y_test, y_pred),
        "neg_mean_absolute_error": mean_absolute_error(y_test, y_pred),
        "r2": r2_score(y_test, y_pred),
    }

    mlflow.log_metrics(metrics)
  1. Log all diagnostic figures seen in Metrics section: residuals distribution, QQ plot, target distribution for the test set, target distribution for the predictions and the permutation feature importance plot.

Use mlflow.log_figure() for each figure. The figures are generated by functions created in last section Metrics

  • residuals_distribution(residuals, score) — distribution of residuals;
  • QQplot(y_test, y_pred) — quantile-quantile plot;
  • target_distribution(y_test) and target_distribution(y_pred) — distributions of actual and predicted values;
  • permutation_importance(calculate_importance(...)) — ranked feature importances based on permutation.

Each call to mlflow.log_figure() requires a figure object and a destination filename (artifact path).

    mlflow.log_figure(
        residuals_distribution(residuals, metrics["r2"]),
        "residuals_distrib.png"
    )
    mlflow.log_figure(QQplot(y_test, y_pred), "qqplot.png")
    mlflow.log_figure(target_distribution(y_test), "y_test_distrib.png")
    mlflow.log_figure(target_distribution(y_pred), "y_pred_distrib.png")
    mlflow.log_figure(
        permutation_importance(
            calculate_importance(X_test, y_test, RANDOM_STATE, rf_best, "r2")
        ),
        "importance.png"
    )
  1. Log the trained model as a scikit-learn artifact, together with its input/output signature and an input example. Register it with a version name.

Note: The model signature describes the expected schema for model inputs and outputs. It is inferred from a sample of training data and predictions and enables MLflow to validate inputs at serving time.

Use mlflow.sklearn.log_model() with the following arguments:

  • sk_model: the fitted model object (rf_best);
  • name: the artifact subdirectory name RF;
  • signature: inferred using infer_signature(X_train, rf_best.predict(X_train));
  • input_example: a few rows of training data (e.g. X_train.head(5)) that illustrate the expected input format;
  • registered_model_name: a versioned name in the MLflow model registry, e.g. "rf-run".
See the solution
    import mlflow.sklearn
    from mlflow.models.signature import infer_signature

    signature = infer_signature(X_train, rf_best.predict(X_train))

    mlflow.sklearn.log_model(
        sk_model=rf_best,
        name="RF",
        signature=signature,
        input_example=X_train.head(5),
        registered_model_name="rf-run",
    )
  1. Retrieve and print the run ID at the end of the logging block to confirm that the run was registered successfully.

Inside an active mlflow.start_run() block, you can access the current run via mlflow.active_run(). The run ID is available at mlflow.active_run().info.run_id.

See the solution
    print(f"Run ID : {mlflow.active_run().info.run_id}")

Congrats 🎉! You have logged a complete ML experiment in MLflow: configuration tags, hyperparameters, evaluation metrics, diagnostic figures and the model artifact with its signature. You can now open the MLflow UI to inspect and compare your runs.