环境和配置搭建讨论区 #41

File "/root/.cache/huggingface/modules/transformers_modules/chatglm4/modeling_chatglm.py", line 939, in _update_model_kwargs_for_generation
cache_name, cache = self._extract_past_from_model_output(
ValueError: too many values to unpack (expected 2)
有大佬能看一下吗，这个报错是我环境没装对吗

1 reply

ciaoyizhen Jul 19, 2024

issue里有这个好像修复了

richard-yao · 2024-07-20T01:47:39Z

richard-yao
Jul 20, 2024

使用vllm部署9b-chat进行推理的话，8k上下文需要多少显存呢？我在文档里没看到官方数据呢

0 replies

GitMiracle · 2024-07-23T09:36:46Z

GitMiracle
Jul 23, 2024

目前国产显卡能支持PyTorch和tensorflow的话，国产显卡官网没有描述是否支持cuda和rocm，只是说支持PyTorch和tensorflow，这样的话能部署glm吗？

0 replies

guiktydftfk · 2024-07-27T00:03:40Z

guiktydftfk
Jul 27, 2024

有没有大佬看一下，本地部署使用文档解读报错，其他都能用。显卡是4060Laptop，驱动版本是560.70

0 replies

liujunjie-cn · 2024-07-29T06:54:42Z

liujunjie-cn
Jul 29, 2024

有没有大佬转onnx报这个错，应该如何解决呀？

/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1738: UserWarning: The exported ONNX model failed ONNX shape inference. The model will not be executable by the ONNX Runtime. If this is unintended and you believe there is a bug, please report an issue at https://github.com/pytorch/pytorch/issues. Error reported by strict ONNX shape inference: [ShapeInferenceError] (op_type:Add, node name: /Add): A typestr: T, has unsupported type: tensor(bool) (Triggered internally at ../torch/csrc/jit/serialization/export.cpp:1469.)
_C._check_onnx_proto(proto)

0 replies

Dong148 · 2024-07-31T03:09:50Z

Dong148
Jul 31, 2024

请问如何使用outlines呀，我用vllm部署openai服务指定response_format是json会报错，看了下是不是outlines兼容性的问题

0 replies

mandone · 2024-09-01T04:35:57Z

mandone
Sep 1, 2024

请问下生产环境要支撑2个qps，最小需要多少的显存？量化版和非量化版方便的话都给一下建议。或者提供一下qps与显存的计算公式？

0 replies

nyx-1997 · 2024-10-22T16:05:03Z

nyx-1997
Oct 22, 2024

220, in generate_stream_glm4
sampling_params = SamplingParams(**params_dict)
TypeError: Unexpected keyword argument 'use_beam_search'
有没有大佬看下，openai-api-server和openai-api-request跑不通

0 replies

DunFFT · 2024-10-23T02:17:25Z

DunFFT
Oct 23, 2024

D:\GLM4\GLM-4\basic_demo>python trans_web_demo.py
Loading checkpoint shards: 0%| | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:\GLM4\GLM-4\basic_demo\trans_web_demo.py", line 58, in
model, tokenizer = load_model_and_tokenizer(MODEL_PATH, trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\GLM4\GLM-4\basic_demo\trans_web_demo.py", line 48, in load_model_and_tokenizer
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\anaconda3\Lib\site-packages\transformers\models\auto\auto_factory.py", line 559, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\anaconda3\Lib\site-packages\transformers\modeling_utils.py", line 4014, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\anaconda3\Lib\site-packages\transformers\modeling_utils.py", line 4482, in _load_pretrained_model
state_dict = load_state_dict(shard_file, is_quantized=is_quantized)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\anaconda3\Lib\site-packages\transformers\modeling_utils.py", line 549, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
这个报错有大佬看看吗，启动trans_web_demo.py时出现这个错误

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

环境和配置搭建讨论区 #41

{{title}}

Replies: 14 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

环境和配置搭建讨论区 #41

zRzRzRzRzRzRzR Jun 5, 2024 Maintainer

Replies: 14 comments · 2 replies

zRzRzRzRzRzRzR
Jun 5, 2024
Maintainer

Replies: 14 comments 2 replies