Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load model while following the tutorial 'Creating a custom serving runtime in KServe ModelMesh' #504

Open
JimBeam2019 opened this issue May 4, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@JimBeam2019
Copy link

Describe the bug

While following the tutorial 'Creating a custom serving runtime in KServe ModelMesh' from the IBM site, I was trying to make a small adjustment, loading the sklearn mnist-svm.joblib model from the localMinIO instead. However, it failed to load the model and returned the error message MLServer Adapter.MLServer Adapter Server.LoadModel MLServer failed to load model {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}.

I was wondering if it is a bug or if there are any mistakes that I have made on the configuration. Please give me any advice and let me know if any further details you need. Really appreciated.

To Reproduce
Steps to reproduce the behavior:

  1. Install ModelMesh Serving in the local minikube following the instruction
  2. Create Custom ML Model, code as below.
from mlserver.model import MLModel
from mlserver.utils import get_model_uri
from mlserver.errors import InferenceError
from mlserver.codecs import DecodedParameterName
from mlserver.types import (
    InferenceRequest,
    InferenceResponse,
    ResponseOutput,
)
import logging
from joblib import load
import numpy as np

from os.path import exists

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

_to_exclude = {
    "parameters": {DecodedParameterName, "headers"},
    'inputs': {"__all__": {"parameters": {DecodedParameterName, "headers"}}}
}

WELLKNOWN_MODEL_FILENAMES = ["mnist-svm.joblib"]

class CustomMLModel(MLModel):

  async def load(self) -> bool:
    model_uri = await get_model_uri(
       self._settings, wellknown_filenames=WELLKNOWN_MODEL_FILENAMES
    )
    logging.info("Model load URI: {model_uri}")

    if exists(model_uri):
      logging.info(f"Loading MNIST model from {model_uri}")
      self._model = load(model_uri)
      logging.info("Model loaded successfully")
    else:
      logging.info(f"Model not exist in {model_uri}")
      self.ready = False
      return self.ready

    self.ready = True
    return self.ready

  async def predict(self, payload: InferenceRequest) -> InferenceResponse:
    input_data = [input_data.data for input_data in payload.inputs]
    input_name = [input_data.name for input_data in payload.inputs]
    input_data_array = np.array(input_data)
    result = self._model.predict(input_data_array) 
    predictions = np.array(result)

    logger.info(f"Predict result is: {result}")
    return InferenceResponse(
        id=payload.id,
        model_name = self.name,
        model_version = self.version,
        outputs = [
            ResponseOutput(
                name = str(input_name[0]),
                shape = predictions.shape,
                datatype = "INT64",
                data=predictions.tolist(),
            )
        ],
    )   
  1. Build a docker image with the Dockfile below, named dev.local/xgb-model:dev.2405042123
FROM python:3.9.13

RUN pip3 install --no-cache-dir mlserver==1.3.2 scikit-learn==1.4.0 joblib==1.3.2

COPY --chown=${USER} ./custom_model.py /opt/custom_model.py
ENV PYTHONPATH=/opt/
WORKDIR /opt

ENV MLSERVER_MODELS_DIR=/models/_mlserver_models \
    MLSERVER_GRPC_PORT=8001 \
    MLSERVER_HTTP_PORT=8002 \
    MLSERVER_METRICS_PORT=8082 \
    MLSERVER_LOAD_MODELS_AT_STARTUP=false \
    MLSERVER_DEBUG=false \
    MLSERVER_PARALLEL_WORKERS=1 \
    MLSERVER_GRPC_MAX_MESSAGE_LENGTH=33554432 \
    # https://github.com/SeldonIO/MLServer/pull/748
    MLSERVER__CUSTOM_GRPC_SERVER_SETTINGS='{"grpc.max_metadata_size": "32768"}' \
    MLSERVER_MODEL_NAME=dummy-model

ENV MLSERVER_MODEL_IMPLEMENTATION=custom_model.CustomMLModel

CMD ["mlserver", "start", "${MLSERVER_MODELS_DIR}"]
  1. Create a serving runtime with the yaml file below
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  name: custom-runtime-0.x
spec:
  supportedModelFormats:
    - name: custom-model
      version: "1"
      autoSelect: true
  protocolVersions:
    - grpc-v2
  multiModel: true
  grpcDataEndpoint: port:8001
  grpcEndpoint: port:8085
  containers:
    - name: mlserver
      image: dev.local/xgb-model:dev.2405042123
      imagePullPolicy: IfNotPresent
      env:
        - name: MLSERVER_MODELS_DIR
          value: "/models/_mlserver_models/"
        - name: MLSERVER_GRPC_PORT
          value: "8001"
        - name: MLSERVER_HTTP_PORT
          value: "8002"
        - name: MLSERVER_LOAD_MODELS_AT_STARTUP
          value: "false"
        - name: MLSERVER_MODEL_NAME
          value: dummy-model
        - name: MLSERVER_HOST
          value: "127.0.0.1"
        - name: MLSERVER_GRPC_MAX_MESSAGE_LENGTH
          value: "-1"
        - name: MLSERVER_MODEL_IMPLEMENTATION
          value: "custom_model.CustomMLModel"
        - name: MLSERVER_DEBUG
          value: "true"
        - name: MLSERVER_MODEL_PARALLEL_WORKERS
          value: "0"
      resources:
        requests:
          cpu: "1"
          memory: "1Gi"
        limits:
          cpu: "2"
          memory: "1Gi"
  builtInAdapter:
    serverType: mlserver
    runtimeManagementPort: 8001
    memBufferBytes: 134217728
    modelLoadingTimeoutMillis: 90000
  1. Create an inference service with the yaml file below
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: minio-model-isvc
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
spec:
  predictor:
    model:
      modelFormat:
        name: custom-model
      runtime: custom-runtime-0.x
      storage:
        key: localMinIO
        path: sklearn/mnist-svm.joblib
  1. After the modelmesh pods start running, open the logs of mlserver-adapter.

Expected behavior

Usually, it should have loaded the model successfully.

Screenshots

However, it shows the logs as below.

2024-05-04T14:00:34Z    INFO    MLServer Adapter        Starting MLServer Adapter       {"adapter_config": {"Port":8085,"MLServerPort":8001,"MLServerContainerMemReqBytes":1073741824,"MLServerMemBufferBytes":134217728,"CapacityInBytes":939524096,"MaxLoadingConcurrency":1,"ModelLoadingTimeoutMS":90000,"DefaultModelSizeInBytes":1000000,"ModelSizeMultiplier":1.25,"RuntimeVersion":"dev.2405042123","LimitModelConcurrency":0,"RootModelDir":"/models/_mlserver_models","UseEmbeddedPuller":true}}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Created root MLServer model directory   {"path": "/models/_mlserver_models"}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Connecting to MLServer...       {"port": 8001}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Initializing Puller     {"Dir": "/models"}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        MLServer runtime adapter started
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server.client-cache   starting clean up of cached clients
2024-05-04T14:00:34Z    INFO    MLServer Adapter        Adapter will run at port        {"port": 8085, "MLServer port": 8001}
2024-05-04T14:00:34Z    INFO    MLServer Adapter        Adapter gRPC Server registered, now serving
2024-05-04T14:00:44Z    INFO    MLServer Adapter.MLServer Adapter Server        Using runtime version returned by MLServer      {"version": "1.3.2"}
2024-05-04T14:00:44Z    INFO    MLServer Adapter.MLServer Adapter Server        runtimeStatus   {"Status": "status:READY capacityInBytes:939524096 maxLoadingConcurrency:1 modelLoadingTimeoutMs:90000 defaultModelSizeInBytes:1000000 runtimeVersion:\"1.3.2\" methodInfos:{key:\"inference.GRPCInferenceService/ModelInfer\" value:{idInjectionPath:1}} methodInfos:{key:\"inference.GRPCInferenceService/ModelMetadata\" value:{idInjectionPath:1}}"}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Model details   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "modelType": "custom-model", "modelPath": "sklearn/mnist-svm.joblib"}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        Reading storage credentials
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        creating new repository client  {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e"}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        found objects to download       {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e", "path": "sklearn/mnist-svm.joblib", "count": 1}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        downloading object      {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e", "path": "sklearn/mnist-svm.joblib", "filename": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib"}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server        Calculated disk size    {"modelFullPath": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "disk_size": 344817}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Generated model settings file   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "schemaPath": "", "implementation": ""}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Adapted model directory for standalone file/dir {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "sourcePath": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "isDir": false, "symLinkPath": "/models/_mlserver_models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "generatedSettingsFile": "/models/_mlserver_models/multi-model-isvc__isvc-1ee2e56a33/model-settings.json"}
2024-05-04T14:00:52Z    ERROR   MLServer Adapter.MLServer Adapter Server.LoadModel      MLServer failed to load model   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}
github.com/kserve/modelmesh-runtime-adapter/model-mesh-mlserver-adapter/server.(*MLServerAdapterServer).LoadModel
        /opt/app/model-mesh-mlserver-adapter/server/server.go:137
github.com/kserve/modelmesh-runtime-adapter/internal/proto/mmesh._ModelRuntime_LoadModel_Handler
        /opt/app/internal/proto/mmesh/model-runtime_grpc.pb.go:206
google.golang.org/grpc.(*Server).processUnaryRPC
        /root/go/pkg/mod/google.golang.org/[email protected]/server.go:1335
google.golang.org/grpc.(*Server).handleStream
        /root/go/pkg/mod/google.golang.org/[email protected]/server.go:1712
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /root/go/pkg/mod/google.golang.org/[email protected]/server.go:947
2024-05-04T14:00:53Z    INFO    MLServer Adapter.MLServer Adapter Server.UnloadModel    Unload request for model not found in MLServer  {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}

Environment (please complete the following information):

  • OS: Ubuntu 22.04.4 LTS
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context

@JimBeam2019 JimBeam2019 added the bug Something isn't working label May 4, 2024
@ckadner
Copy link
Member

ckadner commented May 14, 2024

I was trying to make a small adjustment, loading the sklearn mnist-svm.joblib model from the localMinIO instead.

Did the tutorial or example work without making changes?

@rafvasq -- can you spot something obvious? I would have to go through your tutorial myself and debug 😊

@liaspas
Copy link

liaspas commented Oct 11, 2024

Facing the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants