Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error executing method determine_num_available_blocks #20

Open
empty2enrich opened this issue May 10, 2024 · 1 comment
Open

Error executing method determine_num_available_blocks #20

empty2enrich opened this issue May 10, 2024 · 1 comment

Comments

@empty2enrich
Copy link

使用 vllm 启动 openai server 报错。使用官方的 demo 脚本是正常。

启动命令:python -m vllm.entrypoints.openai.api_server --model /data/huggingface/models--deepseek-ai--DeepSeek-V2-Chat/snapshots/cfa90959d985cd3288fd835519099d9c46fa4842 --tensor-parallel-size 8 --served-model-name deepseek-v2-chat --dtype auto --api-key none --trust-remote-code

error log

(RayWorkerWrapper pid=1402517) INFO 05-10 20:16:16 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance. [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) INFO 05-10 20:16:16 selector.py:32] Using XFormers backend. [repeated 6x across cluster]
Cache shape torch.Size([163840, 64])
(RayWorkerWrapper pid=1401736) Cache shape torch.Size([163840, 64])
(RayWorkerWrapper pid=1402517) INFO 05-10 20:16:18 pynccl_utils.py:43] vLLM is using nccl==2.20.5 [repeated 6x across cluster]
INFO 05-10 20:16:56 model_runner.py:175] Loading model weights took 56.1087 GB
(RayWorkerWrapper pid=1401736) INFO 05-10 20:17:00 model_runner.py:175] Loading model weights took 56.1087 GB
(RayWorkerWrapper pid=1402517) INFO 05-10 20:16:21 utils.py:132] reading GPU P2P access cache from /home/centos/.config/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) Cache shape torch.Size([163840, 64]) [repeated 6x across cluster]
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145] Traceback (most recent call last):
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return executor(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return func(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 139, in determine_num_available_blocks
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     self.model_runner.profile_run()
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return func(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 888, in profile_run
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     self.execute_model(seqs, kv_caches)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return func(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 808, in execute_model
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = model_executable(**execute_model_kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 429, in forward
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = self.model(input_ids, positions, kv_caches,
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 400, in forward
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states, residual = layer(positions, hidden_states,
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 362, in forward
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = self.mlp(hidden_states)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 156, in forward
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     final_hidden_states = fused_moe(hidden_states,
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 510, in fused_moe
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145]     return torch.sum(intermediate_cache3.view(*intermediate_cache3.shape),
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145] RuntimeError: CUDA error: an illegal memory access was encountered
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145] For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(RayWorkerWrapper pid=1402053) ERROR 05-10 20:17:12 worker_base.py:145] 
(RayWorkerWrapper pid=1402053) INFO 05-10 20:17:05 model_runner.py:175] Loading model weights took 56.1087 GB [repeated 6x across cluster]
ERROR 05-10 20:17:12 worker_base.py:145] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
ERROR 05-10 20:17:12 worker_base.py:145] Traceback (most recent call last):
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
ERROR 05-10 20:17:12 worker_base.py:145]     return executor(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-10 20:17:12 worker_base.py:145]     return func(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 139, in determine_num_available_blocks
ERROR 05-10 20:17:12 worker_base.py:145]     self.model_runner.profile_run()
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-10 20:17:12 worker_base.py:145]     return func(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 888, in profile_run
ERROR 05-10 20:17:12 worker_base.py:145]     self.execute_model(seqs, kv_caches)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-10 20:17:12 worker_base.py:145]     return func(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 808, in execute_model
ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = model_executable(**execute_model_kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 429, in forward
ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 400, in forward
ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states, residual = layer(positions, hidden_states,
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 362, in forward
ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = self.mlp(hidden_states)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs)
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 156, in forward
ERROR 05-10 20:17:12 worker_base.py:145]     final_hidden_states = fused_moe(hidden_states,
ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 510, in fused_moe
ERROR 05-10 20:17:12 worker_base.py:145]     return torch.sum(intermediate_cache3.view(*intermediate_cache3.shape),
ERROR 05-10 20:17:12 worker_base.py:145] RuntimeError: CUDA error: an illegal memory access was encountered
ERROR 05-10 20:17:12 worker_base.py:145] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 05-10 20:17:12 worker_base.py:145] For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR 05-10 20:17:12 worker_base.py:145] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 05-10 20:17:12 worker_base.py:145] 
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]:     return _run_code(code, main_globals, None,
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]:     exec(code, run_globals)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 168, in <module>
[rank0]:     engine = AsyncLLMEngine.from_engine_args(
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 324, in __init__
[rank0]:     self.engine = self._init_engine(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
[rank0]:     return engine_class(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 172, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 249, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 27, in determine_num_available_blocks
[rank0]:     num_blocks = self._run_workers("determine_num_available_blocks", )
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 234, in _run_workers
[rank0]:     driver_worker_output = self.driver_worker.execute_method(
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 146, in execute_method
[rank0]:     raise e
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
[rank0]:     return executor(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 139, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 888, in profile_run
[rank0]:     self.execute_model(seqs, kv_caches)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 808, in execute_model
[rank0]:     hidden_states = model_executable(**execute_model_kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 429, in forward
[rank0]:     hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 400, in forward
[rank0]:     hidden_states, residual = layer(positions, hidden_states,
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 362, in forward
[rank0]:     hidden_states = self.mlp(hidden_states)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 156, in forward
[rank0]:     final_hidden_states = fused_moe(hidden_states,
[rank0]:   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 510, in fused_moe
[rank0]:     return torch.sum(intermediate_cache3.view(*intermediate_cache3.shape),
[rank0]: RuntimeError: CUDA error: an illegal memory access was encountered
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution. [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145] Traceback (most recent call last): [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     return executor(*args, **kwargs) [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context [repeated 18x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     return func(*args, **kwargs) [repeated 18x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 139, in determine_num_available_blocks [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     self.model_runner.profile_run() [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 888, in profile_run [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     self.execute_model(seqs, kv_caches) [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 808, in execute_model [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = model_executable(**execute_model_kwargs) [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [repeated 24x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     return self._call_impl(*args, **kwargs) [repeated 24x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [repeated 24x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     return forward_call(*args, **kwargs) [repeated 24x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 156, in forward [repeated 24x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = self.model(input_ids, positions, kv_caches, [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states, residual = layer(positions, hidden_states, [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     hidden_states = self.mlp(hidden_states) [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     final_hidden_states = fused_moe(hidden_states, [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]   File "/data/envs/ll3_3_ds2_vllm/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 510, in fused_moe [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]     return torch.sum(intermediate_cache3.view(*intermediate_cache3.shape), [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145] RuntimeError: CUDA error: an illegal memory access was encountered [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145] For debugging consider passing CUDA_LAUNCH_BLOCKING=1. [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. [repeated 6x across cluster]
(RayWorkerWrapper pid=1402517) ERROR 05-10 20:17:12 worker_base.py:145]  [repeated 6x across cluster]
Failed: Cuda error /home/runner/work/vllm/vllm/csrc/custom_all_reduce.cuh:475 'an illegal memory access was encountered'
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

@stack-heap-overflow
Copy link
Contributor

可能是vllm使用的kernel的兼容性问题?

可以尝试使用eager模式启动api看是否还会有同样问题(readme中的demo也是eager模式):在命令行参数中加入--enforce-eager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants