TGI 2.0.2 CodeLlama error `piece id is out of range.` #1891

philschmid · 2024-05-14T12:35:21Z

System Info

ghcr.io/huggingface/text-generation-inference:2.0.2

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

model=philschmid/code-llama-7b-text-to-sql
num_shard=1
max_input_length=2048
max_total_tokens=4096
max_prefill_token=4096 # 4096

docker run --gpus all -ti -p 8080:80 \
  -e MODEL_ID=$model \
  -e NUM_SHARD=$num_shard \
  -e MAX_INPUT_LENGTH=$max_input_length \
  -e MAX_TOTAL_TOKENS=$max_total_tokens \
  -e MAX_BATCH_PREFILL_TOKENS=$max_prefill_token \
  -e HF_TOKEN=$(cat ~/.cache/huggingface/token) \
   ghcr.io/huggingface/text-generation-inference:2.0.2

Expected behavior

Running Endpoints as with version 2.0.0

Error

2024-05-14T12:30:52.830987Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 253, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 114, in Warmup
    max_supported_total_tokens = self.model.warmup(batch)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 776, in warmup
    _, batch, _ = self.generate_token(batch)
  File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 1206, in generate_token
    toptoken_texts = self.tokenizer.batch_decode(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3771, in batch_decode
    return [
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3772, in <listcomp>
    self.decode(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3811, in decode
    return self._decode(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 1001, in _decode
    filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 973, in convert_ids_to_tokens
    return self._convert_id_to_token(ids)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 277, in _convert_id_to_token
    token = self.sp_model.IdToPiece(index)
  File "/opt/conda/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1045, in _batched_func
    return _func(self, arg)
  File "/opt/conda/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
    raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

2024-05-14T12:30:52.841680Z ERROR warmup{max_input_length=2048 max_prefill_tokens=4096 max_total_tokens=4096 max_batch_size=None}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: piece id is out of range.
Error: Warmup(Generation("piece id is out of range."))
2024-05-14T12:30:52.870991Z ERROR text_generation_launcher: Webserver Crashed

- The need for the slow tokenizer default stems from back when llama 1 was introduced and all the flags where not supported in `tokenizers`. - Fixes #1891

Narsil · 2024-05-24T10:13:02Z

Hmm "regression" must have happened in our dependency (either transformers or tokenizers).

Nothing changed in llama itself, but thereś a codepath which loads LlamaTokenizer first, and fallsback to AutoTokenizer (which should be the one to use for codellama).

However, it seems that now LlamaTokenizer works, but fails to account for the added tokens, and therefore the state of the tokenizer is invalid.

@OlivierDehaene

- The need for the slow tokenizer default stems from back when llama 1 was introduced and all the flags where not supported in `tokenizers`. - Fixes #1891 # What does this PR do?   Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Narsil added a commit that referenced this issue May 24, 2024

Fixing codellama loads by using purely AutoTokenizer.

ad0b36b

- The need for the slow tokenizer default stems from back when llama 1 was introduced and all the flags where not supported in `tokenizers`. - Fixes #1891

Narsil mentioned this issue May 24, 2024

Fixing codellama loads by using purely AutoTokenizer. #1947

Merged

5 tasks

Narsil closed this as completed in #1947 May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TGI 2.0.2 CodeLlama error `piece id is out of range.` #1891

TGI 2.0.2 CodeLlama error `piece id is out of range.` #1891

philschmid commented May 14, 2024 •

edited

Narsil commented May 24, 2024

TGI 2.0.2 CodeLlama error piece id is out of range. #1891

TGI 2.0.2 CodeLlama error piece id is out of range. #1891

Comments

philschmid commented May 14, 2024 • edited

System Info

Information

Tasks

Reproduction

Expected behavior

Error

Narsil commented May 24, 2024

TGI 2.0.2 CodeLlama error `piece id is out of range.` #1891

TGI 2.0.2 CodeLlama error `piece id is out of range.` #1891

philschmid commented May 14, 2024 •

edited