环境和配置搭建讨论区 #41
Replies: 14 comments 2 replies
-
有vllm的配置参数吗 |
Beta Was this translation helpful? Give feedback.
-
有人遇到这个问题吗?可以正常加载模型和输入问题,但是GLM4在回答时报错:RuntimeError: cutlassF: no kernel found to launch! |
Beta Was this translation helpful? Give feedback.
-
两张卡vllm部署python运行成功 镜像启动失败,报NCCL Error |
Beta Was this translation helpful? Give feedback.
-
4的128k和1M版本支持function call吗?我记得3只有8K版本支持function call |
Beta Was this translation helpful? Give feedback.
-
File "/root/.cache/huggingface/modules/transformers_modules/chatglm4/modeling_chatglm.py", line 939, in _update_model_kwargs_for_generation |
Beta Was this translation helpful? Give feedback.
-
使用vllm部署9b-chat进行推理的话,8k上下文需要多少显存呢?我在文档里没看到官方数据呢 |
Beta Was this translation helpful? Give feedback.
-
目前国产显卡能支持PyTorch和tensorflow的话,国产显卡官网没有描述是否支持cuda和rocm,只是说支持PyTorch和tensorflow,这样的话能部署glm吗? |
Beta Was this translation helpful? Give feedback.
-
有没有大佬转onnx报这个错,应该如何解决呀? /usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1738: UserWarning: The exported ONNX model failed ONNX shape inference. The model will not be executable by the ONNX Runtime. If this is unintended and you believe there is a bug, please report an issue at https://github.com/pytorch/pytorch/issues. Error reported by strict ONNX shape inference: [ShapeInferenceError] (op_type:Add, node name: /Add): A typestr: T, has unsupported type: tensor(bool) (Triggered internally at ../torch/csrc/jit/serialization/export.cpp:1469.) |
Beta Was this translation helpful? Give feedback.
-
请问如何使用outlines呀,我用vllm部署openai服务指定response_format是json会报错,看了下是不是outlines兼容性的问题 |
Beta Was this translation helpful? Give feedback.
-
请问下生产环境要支撑2个qps,最小需要多少的显存?量化版和非量化版方便的话都给一下建议。或者提供一下qps与显存的计算公式? |
Beta Was this translation helpful? Give feedback.
-
220, in generate_stream_glm4 |
Beta Was this translation helpful? Give feedback.
-
D:\GLM4\GLM-4\basic_demo>python trans_web_demo.py |
Beta Was this translation helpful? Give feedback.
-
本区域为社群遇到的非官方配置下可能遇到的环境问题,欢迎大家相互解答
Beta Was this translation helpful? Give feedback.
All reactions