Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: logprobs is not compatible with the OpenAI spec #4795

Open
GabrielBianconi opened this issue May 13, 2024 · 1 comment · May be fixed by #5031
Open

[Bug]: logprobs is not compatible with the OpenAI spec #4795

GabrielBianconi opened this issue May 13, 2024 · 1 comment · May be fixed by #5031
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@GabrielBianconi
Copy link

Your current environment

I'm using Runpod Serverless vLLM (https://github.com/runpod-workers/worker-vllm) so I can't run this command. However, I confirmed that the issue is in the codebase in main:

https://github.com/vllm-project/vllm/blob/0fca3cdcf265cd375bca684d951702b6b7adf65a/vllm/entrypoints/openai/protocol.py

🐛 Describe the bug

The behavior of logprobs=True does not match OpenAI's.

I identified two issues:

(1) vLLM throws an error when logprobs=True and top_logprobs is missing.

OpenAI works fine:

completion = openai_client.chat.completions.create(
  model="gpt-4-turbo-preview",
  messages=[
    {"role": "user", "content": "Hi!"}
  ],
  logprobs=True,
)
ChatCompletion(id='chatcmpl-9OY4XFK8suJ7ed0yw5vglbTsOZUt1', choices=[Choice(finish_reason='stop', index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=-0.0008963357, top_logprobs=[]), ChatCompletionTokenLogprob(token='!', bytes=[33], logprob=-9.729906e-06, top_logprobs=[]), ChatCompletionTokenLogprob(token=' How', bytes=[32, 72, 111, 119], logprob=-1.4140442e-05, top_logprobs=[]), ChatCompletionTokenLogprob(token=' can', bytes=[32, 99, 97, 110], logprob=-0.0004804817, top_logprobs=[]), ChatCompletionTokenLogprob(token=' I', bytes=[32, 73], logprob=-1.11603495e-05, top_logprobs=[]), ChatCompletionTokenLogprob(token=' assist', bytes=[32, 97, 115, 115, 105, 115, 116], logprob=-0.30324343, top_logprobs=[]), ChatCompletionTokenLogprob(token=' you', bytes=[32, 121, 111, 117], logprob=-5.5122365e-07, top_logprobs=[]), ChatCompletionTokenLogprob(token=' today', bytes=[32, 116, 111, 100, 97, 121], logprob=-1.700133e-05, top_logprobs=[]), ChatCompletionTokenLogprob(token='?', bytes=[63], logprob=-0.001247851, top_logprobs=[])]), message=ChatCompletionMessage(content='Hello! How can I assist you today?', role='assistant', function_call=None, tool_calls=None))], created=1715637873, model='gpt-4-0125-preview', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=9, total_tokens=18))

vLLM breaks:

completion = vllm_client.chat.completions.create(
  model="...my-llama3-8b-finetune...",
  messages=[
    {"role": "user", "content": "Hi!"}
  ],
  logprobs=True,
)
ChatCompletion(id=None, choices=None, created=None, model=None, object='error', system_fingerprint=None, usage=None, code=400, message='Top logprobs must be set when logprobs is.', param=None, type='BadRequestError')

via

raise ValueError("Top logprobs must be set when logprobs is.")

(2) Even wtih top_logprobs=1, the behavior doesn't match.

OpenAI:

completion = openai_client.chat.completions.create(
  model="gpt-4-turbo-preview",
  messages=[
    {"role": "user", "content": "Hi!"}
  ],
  logprobs=True,
  top_logprobs=1,
)
ChatCompletion(id='chatcmpl-9OY4PwigZVtoM6vHXELby3NWqCyaX', choices=[Choice(finish_reason='stop', index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=-0.0008963357, top_logprobs=[TopLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=-0.0008963357)]), ChatCompletionTokenLogprob(token='!', bytes=[33], logprob=-9.729906e-06, top_logprobs=[TopLogprob(token='!', bytes=[33], logprob=-9.729906e-06)]), ChatCompletionTokenLogprob(token=' How', bytes=[32, 72, 111, 119], logprob=-1.4140442e-05, top_logprobs=[TopLogprob(token=' How', bytes=[32, 72, 111, 119], logprob=-1.4140442e-05)]), ChatCompletionTokenLogprob(token=' can', bytes=[32, 99, 97, 110], logprob=-0.0004804817, top_logprobs=[TopLogprob(token=' can', bytes=[32, 99, 97, 110], logprob=-0.0004804817)]), ChatCompletionTokenLogprob(token=' I', bytes=[32, 73], logprob=-1.11603495e-05, top_logprobs=[TopLogprob(token=' I', bytes=[32, 73], logprob=-1.11603495e-05)]), ChatCompletionTokenLogprob(token=' assist', bytes=[32, 97, 115, 115, 105, 115, 116], logprob=-0.3164038, top_logprobs=[TopLogprob(token=' assist', bytes=[32, 97, 115, 115, 105, 115, 116], logprob=-0.3164038)]), ChatCompletionTokenLogprob(token=' you', bytes=[32, 121, 111, 117], logprob=-5.5122365e-07, top_logprobs=[TopLogprob(token=' you', bytes=[32, 121, 111, 117], logprob=-5.5122365e-07)]), ChatCompletionTokenLogprob(token=' today', bytes=[32, 116, 111, 100, 97, 121], logprob=-1.50940705e-05, top_logprobs=[TopLogprob(token=' today', bytes=[32, 116, 111, 100, 97, 121], logprob=-1.50940705e-05)]), ChatCompletionTokenLogprob(token='?', bytes=[63], logprob=-0.0011334282, top_logprobs=[TopLogprob(token='?', bytes=[63], logprob=-0.0011334282)])]), message=ChatCompletionMessage(content='Hello! How can I assist you today?', role='assistant', function_call=None, tool_calls=None))], created=1715637865, model='gpt-4-0125-preview', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=9, total_tokens=18))

vLLM:

completion = vllm_client.chat.completions.create(
  model="...my-llama3-8b-finetune...",
  messages=[
    {"role": "user", "content": "Hi!"}
  ],
  logprobs=True,
  top_logprobs=1,
)
ChatCompletion(id='cmpl-c9459bd09bb24a028fef65190d22248d', choices=[Choice(finish_reason='stop', index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[0, 2, 3, 7, 9, 14, 18, 24, 25], token_logprobs=[-1.6077232360839844, -1.1920228004455566, -0.0803707167506218, -0.8764647841453552, -0.2521621584892273, -1.823885577323381e-05, -0.020511768758296967, -0.007883979007601738, -0.0015675650211051106], tokens=['Hi', '!', 'ĠHow', "'s", 'Ġyour', 'Ġday', 'Ġgoing', '?', '<|im_end|>'], top_logprobs=[{'Hey': -1.2327232360839844, 'Hi': -1.6077232360839844}, {'!': -1.1920228004455566, 'Ġthere': -0.44202280044555664}, {'ĠHow': -0.0803707167506218}, {"'s": -0.8764647841453552, 'Ġare': -0.6264647841453552}, {'Ġyour': -0.2521621584892273}, {'Ġday': -1.823885577323381e-05}, {'Ġgoing': -0.020511768758296967}, {'?': -0.007883979007601738}, {'<|im_end|>': -0.0015675650211051106}]), message=ChatCompletionMessage(content="Hi! How's your day going?", role='assistant', function_call=None, tool_calls=None))], created=2362885, model='REDACTED', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=12, total_tokens=21))

Notice that, for example, token_logprobs comes up with vLLM but not with OpenAI.


These issues break libraries expecting OpenAI-compatible responses, e.g. Rust's async_openai we are using.

@GabrielBianconi GabrielBianconi added the bug Something isn't working label May 13, 2024
@simon-mo simon-mo added good first issue Good for newcomers help wanted Extra attention is needed labels May 14, 2024
@Etelis
Copy link

Etelis commented May 16, 2024

I will take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants