Fix Pandas codec decoding from numpy arrays #1751
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pandas codec currently breaks mlflow runtime's schema enforcement (#1625). This is because numeric mlflow datatypes are assigned
NumpyCodec
content type by default, which decodesPandasCodec
's explicit batch size into a column vector:However,
pd.Series
expects one-dimensional data, and when the_decoded_payload
is converted to apd.Series
here returns unexpected results:This seems like a bug from pandas side,
dtype="int64"
should have thrown an error (and in fact it does for higher dimension, but for some reason is okay with[np.array([1]), np.array([2]), np.array([3])]
and weirdly casts them into tuples), but either way mlserver should have dropped the trailing axis aspd.Series
already implies column vector.From this test, it seems like the conversion to
list
in the codec was to accommodate the use case ofpd.Series
of tensors:I'd argue in the context of Pandas codec this should be made explicit with shape
[1, 1]
and datatypeBYTES
for array elements. In fact, a test above does so, specifying the intended shape is a[2,1]
column vector with datatypeBYTES
forlist
elements.This PR explicitly reshapes
np.array
payload to expected shapes before conversion topd.Series
.