[Feature request] Add a possibility to persist artifacts besides the model itself #46

benjamin-work · 2017-08-29T12:26:24Z

At the moment, only the model can be persisted and loaded. However, there are scenarios that necessitate saving and loading additional data.

E.g., assume that we have a regression problem. We want to normalize the targets to a certain range during training but when calling the predict service, data should be mapped back to the original range. Touching the targets is not part of an sklearn pipeline, so we may do it during data loading. However, when we start the prediction service, we need to have access to the mapping. Currently, we would have to load the data again to generate the mapping, or try to save the mapping as an attribute of the model.

Ideally, we would be able to just save and load the mapping using palladium tools. The solution should not be too specific to the example above, but be a more general solution to how to persist additional artifacts.

The text was updated successfully, but these errors were encountered:

dnouri · 2017-08-29T13:34:35Z

Another way to deal with this is to move the normalization into a model wrapper (or "meta-estimator" in scikit-learn). A NormalizeTarget wrapper would normalize on the way in and out. The model is somewhat more self-contained this way, which may be good regardless.

benjamin-work · 2017-08-29T14:02:49Z

Yes, for this specific case, that would work. For other cases, that could be an awkward solution. I could imagine that a more general solution would have a "cache" that is just stored together with the model, so that there is no need for handling separate files.

dnouri · 2017-08-29T14:55:11Z

There's this utility called palladium.interfaces.annotate which is used by Palladium to store the model version along with the model pickle. It's a glorified way of sticking an attribute onto the object before it's pickled.

To stick something in you would call annotate(model, {'useful': 'stuffs'}), and to get it out again (say in production, after loading): stuffs = annotate(model)['useful'].

benjamin-work · 2017-08-29T15:01:47Z

Okay, so you would suggest to use this if extra data needs to be saved?

dnouri · 2017-08-29T15:16:28Z

Okay, so you would suggest to use this if extra data needs to be saved?

Hmm, just had another look and it seems that at least palladium.persistence.Database assumes it can call json.dumps on the annotations. (It then stores the annotations in a separate column.) So this won't work for all types of data.

Which leaves us with what you already did I assume, which is sticking attributes on the model object. Not too nice, but probably nicer than having to worry about storing extra data somewhere else and having to support that in all persisters.

If you prefer to use something like annotate, then we could make a trivial change and add other keys, besides __metadata__, to annotate. (__metadata__ is what it's trying to be clever about when persisting.)

benjamin-work · 2017-08-29T15:23:25Z

But isn't the model just a blob? Instead of persisting the model, could we not persist something like {'model': model, 'cache': cache}? That way, we don't need to store something extra and worry about keeping model and extra in sync.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Add a possibility to persist artifacts besides the model itself #46

[Feature request] Add a possibility to persist artifacts besides the model itself #46

benjamin-work commented Aug 29, 2017

dnouri commented Aug 29, 2017

benjamin-work commented Aug 29, 2017

dnouri commented Aug 29, 2017

benjamin-work commented Aug 29, 2017

dnouri commented Aug 29, 2017 •

edited

Loading

benjamin-work commented Aug 29, 2017

[Feature request] Add a possibility to persist artifacts besides the model itself #46

[Feature request] Add a possibility to persist artifacts besides the model itself #46

Comments

benjamin-work commented Aug 29, 2017

dnouri commented Aug 29, 2017

benjamin-work commented Aug 29, 2017

dnouri commented Aug 29, 2017

benjamin-work commented Aug 29, 2017

dnouri commented Aug 29, 2017 • edited Loading

benjamin-work commented Aug 29, 2017

dnouri commented Aug 29, 2017 •

edited

Loading