[Bug]: `logprobs` is not compatible with the OpenAI spec #4795

GabrielBianconi · 2024-05-13T22:11:16Z

Your current environment

I'm using Runpod Serverless vLLM (https://github.com/runpod-workers/worker-vllm) so I can't run this command. However, I confirmed that the issue is in the codebase in main:

https://github.com/vllm-project/vllm/blob/0fca3cdcf265cd375bca684d951702b6b7adf65a/vllm/entrypoints/openai/protocol.py

🐛 Describe the bug

The behavior of logprobs=True does not match OpenAI's.

I identified two issues:

(1) vLLM throws an error when logprobs=True and top_logprobs is missing.

OpenAI works fine:

completion = openai_client.chat.completions.create(
  model="gpt-4-turbo-preview",
  messages=[
    {"role": "user", "content": "Hi!"}
  ],
  logprobs=True,
)

ChatCompletion(id='chatcmpl-9OY4XFK8suJ7ed0yw5vglbTsOZUt1', choices=[Choice(finish_reason='stop', index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=-0.0008963357, top_logprobs=[]), ChatCompletionTokenLogprob(token='!', bytes=[33], logprob=-9.729906e-06, top_logprobs=[]), ChatCompletionTokenLogprob(token=' How', bytes=[32, 72, 111, 119], logprob=-1.4140442e-05, top_logprobs=[]), ChatCompletionTokenLogprob(token=' can', bytes=[32, 99, 97, 110], logprob=-0.0004804817, top_logprobs=[]), ChatCompletionTokenLogprob(token=' I', bytes=[32, 73], logprob=-1.11603495e-05, top_logprobs=[]), ChatCompletionTokenLogprob(token=' assist', bytes=[32, 97, 115, 115, 105, 115, 116], logprob=-0.30324343, top_logprobs=[]), ChatCompletionTokenLogprob(token=' you', bytes=[32, 121, 111, 117], logprob=-5.5122365e-07, top_logprobs=[]), ChatCompletionTokenLogprob(token=' today', bytes=[32, 116, 111, 100, 97, 121], logprob=-1.700133e-05, top_logprobs=[]), ChatCompletionTokenLogprob(token='?', bytes=[63], logprob=-0.001247851, top_logprobs=[])]), message=ChatCompletionMessage(content='Hello! How can I assist you today?', role='assistant', function_call=None, tool_calls=None))], created=1715637873, model='gpt-4-0125-preview', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=9, total_tokens=18))

vLLM breaks:

completion = vllm_client.chat.completions.create(
  model="...my-llama3-8b-finetune...",
  messages=[
    {"role": "user", "content": "Hi!"}
  ],
  logprobs=True,
)

ChatCompletion(id=None, choices=None, created=None, model=None, object='error', system_fingerprint=None, usage=None, code=400, message='Top logprobs must be set when logprobs is.', param=None, type='BadRequestError')

via

vllm/vllm/entrypoints/openai/protocol.py

Line 162 in 0fca3cd

raise ValueError("Top logprobs must be set when logprobs is.")

(2) Even wtih top_logprobs=1, the behavior doesn't match.

OpenAI:

completion = openai_client.chat.completions.create(
  model="gpt-4-turbo-preview",
  messages=[
    {"role": "user", "content": "Hi!"}
  ],
  logprobs=True,
  top_logprobs=1,
)

ChatCompletion(id='chatcmpl-9OY4PwigZVtoM6vHXELby3NWqCyaX', choices=[Choice(finish_reason='stop', index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=-0.0008963357, top_logprobs=[TopLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=-0.0008963357)]), ChatCompletionTokenLogprob(token='!', bytes=[33], logprob=-9.729906e-06, top_logprobs=[TopLogprob(token='!', bytes=[33], logprob=-9.729906e-06)]), ChatCompletionTokenLogprob(token=' How', bytes=[32, 72, 111, 119], logprob=-1.4140442e-05, top_logprobs=[TopLogprob(token=' How', bytes=[32, 72, 111, 119], logprob=-1.4140442e-05)]), ChatCompletionTokenLogprob(token=' can', bytes=[32, 99, 97, 110], logprob=-0.0004804817, top_logprobs=[TopLogprob(token=' can', bytes=[32, 99, 97, 110], logprob=-0.0004804817)]), ChatCompletionTokenLogprob(token=' I', bytes=[32, 73], logprob=-1.11603495e-05, top_logprobs=[TopLogprob(token=' I', bytes=[32, 73], logprob=-1.11603495e-05)]), ChatCompletionTokenLogprob(token=' assist', bytes=[32, 97, 115, 115, 105, 115, 116], logprob=-0.3164038, top_logprobs=[TopLogprob(token=' assist', bytes=[32, 97, 115, 115, 105, 115, 116], logprob=-0.3164038)]), ChatCompletionTokenLogprob(token=' you', bytes=[32, 121, 111, 117], logprob=-5.5122365e-07, top_logprobs=[TopLogprob(token=' you', bytes=[32, 121, 111, 117], logprob=-5.5122365e-07)]), ChatCompletionTokenLogprob(token=' today', bytes=[32, 116, 111, 100, 97, 121], logprob=-1.50940705e-05, top_logprobs=[TopLogprob(token=' today', bytes=[32, 116, 111, 100, 97, 121], logprob=-1.50940705e-05)]), ChatCompletionTokenLogprob(token='?', bytes=[63], logprob=-0.0011334282, top_logprobs=[TopLogprob(token='?', bytes=[63], logprob=-0.0011334282)])]), message=ChatCompletionMessage(content='Hello! How can I assist you today?', role='assistant', function_call=None, tool_calls=None))], created=1715637865, model='gpt-4-0125-preview', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=9, total_tokens=18))

vLLM:

completion = vllm_client.chat.completions.create(
  model="...my-llama3-8b-finetune...",
  messages=[
    {"role": "user", "content": "Hi!"}
  ],
  logprobs=True,
  top_logprobs=1,
)

ChatCompletion(id='cmpl-c9459bd09bb24a028fef65190d22248d', choices=[Choice(finish_reason='stop', index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[0, 2, 3, 7, 9, 14, 18, 24, 25], token_logprobs=[-1.6077232360839844, -1.1920228004455566, -0.0803707167506218, -0.8764647841453552, -0.2521621584892273, -1.823885577323381e-05, -0.020511768758296967, -0.007883979007601738, -0.0015675650211051106], tokens=['Hi', '!', 'ĠHow', "'s", 'Ġyour', 'Ġday', 'Ġgoing', '?', '<|im_end|>'], top_logprobs=[{'Hey': -1.2327232360839844, 'Hi': -1.6077232360839844}, {'!': -1.1920228004455566, 'Ġthere': -0.44202280044555664}, {'ĠHow': -0.0803707167506218}, {"'s": -0.8764647841453552, 'Ġare': -0.6264647841453552}, {'Ġyour': -0.2521621584892273}, {'Ġday': -1.823885577323381e-05}, {'Ġgoing': -0.020511768758296967}, {'?': -0.007883979007601738}, {'<|im_end|>': -0.0015675650211051106}]), message=ChatCompletionMessage(content="Hi! How's your day going?", role='assistant', function_call=None, tool_calls=None))], created=2362885, model='REDACTED', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=12, total_tokens=21))

Notice that, for example, token_logprobs comes up with vLLM but not with OpenAI.

These issues break libraries expecting OpenAI-compatible responses, e.g. Rust's async_openai we are using.

The text was updated successfully, but these errors were encountered:

Etelis · 2024-05-16T17:28:49Z

I will take a look!

…llm-project#4795)

GabrielBianconi added the bug Something isn't working label May 13, 2024

simon-mo added good first issue Good for newcomers help wanted Extra attention is needed labels May 14, 2024

This was referenced May 24, 2024

[Bugfix] logprobs is not compatible with the OpenAI spec #4795 #5031

Open

[Bugfix][Frontend] Fix format of returned logprobs for OpenAI Chat Completions API #5026

Open

DarkLight1337 added a commit to DarkLight1337/vllm-rocm that referenced this issue May 24, 2024

Allow logprobs=True when top_logprobs=0 or top_logprobs=None (v…

85e73b0

…llm-project#4795)

DarkLight1337 added a commit to DarkLight1337/vllm-rocm that referenced this issue May 25, 2024

Allow logprobs=True when top_logprobs=0 or top_logprobs=None (v…

390e93d

…llm-project#4795)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: `logprobs` is not compatible with the OpenAI spec #4795

[Bug]: `logprobs` is not compatible with the OpenAI spec #4795

GabrielBianconi commented May 13, 2024

Etelis commented May 16, 2024

[Bug]: logprobs is not compatible with the OpenAI spec #4795

[Bug]: logprobs is not compatible with the OpenAI spec #4795

Comments

GabrielBianconi commented May 13, 2024

Your current environment

🐛 Describe the bug

Etelis commented May 16, 2024

[Bug]: `logprobs` is not compatible with the OpenAI spec #4795

[Bug]: `logprobs` is not compatible with the OpenAI spec #4795