Currently {mlflow} doesn’t support directly logging R models to Unity Catalog. This section will cover why, and then how to overcome each roadblock.
1.1 Unity Catalog Model Requirements
For models to be logged into Unity Catalog they must have a model signature. The Model signature defines the schema for model inputs/outputs.
Typically when using python this would be inferred via model input examples. Input examples are optional but strongly recommended.
The documentation discusses signature enforcement, currently this isn’t implemented for R. Therefore you can decide if the signature is a dummy value for the sake of moving forward, or correct to clearly communicate the behaviour of the model.
Important
It’s important to clarify that for python the signature is enforced at time of inference not when registering the model to Unity Catalog.
The signature correctness is not validated when registering the model, it just has to be syntactically valid.
So, let’s look at the existing code to log models in the crate flavour:
Create the directory to save the model if it doesn’t exist, if it does, empty it
2
Serialise the model, which is an object of class crate (from {carrier} package)
3
Save the serialised model via saveRDS to the directory as crate.bin
4
Define the model specification, this contains metadata required ensure reproducibility. In this case it’s only specifying a version and what file the model can be found within.
The missing puzzle piece is the definition of a signature. Instead of explicitly adding code to the crate flavour itself, we’ll take advantage of the model_spec parameter.
That means we can focus on mlflow::mlflow_log_model directly, we’d need to adjust the code as follows:
1mlflow_log_model <-function(model, artifact_path, signature, ...) { temp_path <- fs::path_temp(artifact_path) model_spec <-mlflow_save_model( model, path = temp_path,2model_spec =list(utc_time_created =mlflow_timestamp(),run_id =mlflow_get_active_run_id_or_start_run(),artifact_path = artifact_path,flavors =list(),signature = signature ), ...) res <-mlflow_log_artifact(path = temp_path, artifact_path = artifact_path)tryCatch({ mlflow:::mlflow_record_logged_model(model_spec) },error =function(e) {warning(paste("Logging model metadata to the tracking server has failed, possibly due to older","server version. The model artifacts have been logged successfully.","In addition to exporting model artifacts, MLflow clients 1.7.0 and above","attempt to record model metadata to the tracking store. If logging to a","mlflow server via REST, consider upgrading the server version to MLflow","1.7.0 or above.", sep=" ") ) }) res}
1
Add a new parameter signature
2
Propagate signature to the model_spec parameter when invoking mlflow::mlflow_save_model
Benefit of this method is that all model flavors will inherit the capability to log a signature.
1.2 Working Through the Solution
To keep things simple we’ll be logging a “model” (a function which divides by two).
half <-function(x) x /2half(1:10)
[1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Without any changes, a simplified example of logging to {mlflow} would look like:
library(carrier)library(mlflow)with(mlflow_start_run(), {# typically you'd do more modelling related activities here model <- carrier::crate(~half(.x))1mlflow_log_model(model, "model")})
1
As discussed earlier, this is where things start to go awry with respect to Unity Catalog
1.2.1 Patching mlflow_log_model
Note
Technically, patching mlflow_log_model isn’t the only way to achieve this fix - you could modify the yaml after it’s written.
I won’t be showing that method as It’s just as tedious and can change depending on the model flavour (with respect to where artifacts may reside), patching is more robust.
1mlflow_log_model <-function(model, artifact_path, signature =NULL, ...) {2 format_signature <-function(signature) {lapply(signature, function(x) { jsonlite::toJSON(x, auto_unbox =TRUE) }) } temp_path <- fs::path_temp(artifact_path) model_spec <-mlflow_save_model(model, path = temp_path, model_spec =list(utc_time_created = mlflow:::mlflow_timestamp(),run_id = mlflow:::mlflow_get_active_run_id_or_start_run(),artifact_path = artifact_path, flavors =list(),3signature =format_signature(signature) ), ...) res <-mlflow_log_artifact(path = temp_path, artifact_path = artifact_path)tryCatch({ mlflow:::mlflow_record_logged_model(model_spec) },error =function(e) {warning(paste("Logging model metadata to the tracking server has failed, possibly due to older","server version. The model artifacts have been logged successfully.","In addition to exporting model artifacts, MLflow clients 1.7.0 and above","attempt to record model metadata to the tracking store. If logging to a","mlflow server via REST, consider upgrading the server version to MLflow","1.7.0 or above.", sep=" ") ) }) res}# overriding the function in the existing mlflow namespace assignInNamespace("mlflow_log_model", mlflow_log_model, ns ="mlflow")
1
signature has been added to function parameters, it’s defaulting to NULL so that existing code won’t break
2
Adding format_signature function so don’t need to write JSON by hand, adding this within function for simplicity
3
signature is propagated to mlflow_save_model’s model_spec parameter which will write a valid signature
1.2.2 Logging Model with a Signature
with(mlflow_start_run(), {# typically you'd do more modelling related activities here model <- carrier::crate(~half(.x))1 signature <-list(inputs =list(list(type ="double", name ="x")),outputs =list(list(type ="double")) )2mlflow_log_model(model, "model", signature = signature)})
1
Explicitly defining a signature, a list that contains input and outputs, each are lists of lists respectively
2
Passing defined signature to the now patched mlflow_log_model function
1.2.3 Registration to Unity Catalog
Now that the prerequisite of adding a model signature has been satisfied there is one last hurdle to overcome, registering to Unity Catalog.
The hurdle is due to {mlflow} not having been updated yet to support registration to Unity Catalog directly. The easiest way to overcome this is to simply register the run via python.
It’s considerably easier to just use Python to register the model at this time.
1.3 Fixing mlflow
Ideally this page wouldn’t exist and {mlflow} would support Unity Catalog. Hopefully sometime soon I find the time to make a pull request myself - until then this serves as a guide.