huggingface / text-generation-inference Public

Notifications
Fork 876
Star 8k

Code
Issues 134
Pull requests 12
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

134 Open 942 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

LoRA Adapter from local model are leading to error

#1893 opened May 14, 2024 by philschmid

2 of 4 tasks

TGI 2.0.2 CodeLlama error piece id is out of range.

#1891 opened May 14, 2024 by philschmid

2 of 4 tasks

Router /v1/chat/completions not compatible with openai spec

#1887 opened May 14, 2024 by phangiabao98

2 of 4 tasks

Min P generation parameter

#1885 opened May 13, 2024 by LawrenceGrigoryan

Question about KV cache

#1883 opened May 13, 2024 by martinigoyanes

SnapKV support

#1881 opened May 13, 2024 by icyxp

Logging has no formating when using docker enviroment instead of command

#1880 opened May 11, 2024 by onel

1 of 4 tasks

Multi-Model Endpoint support in Sagemaker

#1878 opened May 10, 2024 by Najib-Haq

concurrent requests permit limit is broken

#1877 opened May 10, 2024 by oOraph

1 of 4 tasks

text generation details not working when stream=False

#1876 opened May 10, 2024 by uyeongkim

2 of 4 tasks

How to share memory among 2 GPUS for distributed inference?

#1875 opened May 10, 2024 by martinigoyanes

Automatic NUMA binding

#1874 opened May 10, 2024 by fxmarty

[Question] Onnx support in TGI

#1873 opened May 9, 2024 by Ben-Epstein

how do I adjust the logging level when launching via the docker container?

#1872 opened May 8, 2024 by bitsofinfo

2 of 4 tasks

llama3-70B-Instruct-AWQ causing CUDA error: an illegal memory access was encountered

#1871 opened May 8, 2024 by anindya-saha

4 tasks

Cannot use Inference Endpoint: UnprocessableEntityError: Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'}

#1870 opened May 8, 2024 by rvoak

1 of 4 tasks

"docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -e HUGGING_FACE_HUB_TOKEN={your_token} ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard" showing error with my token id that "Unable to find image 'ghcr.io/huggingface/text-generation-inference:latest' locally latest: Pulling from huggingface/text-generation-inference docker: no matching manifest for linux/arm64/v8 in the manifest list entries. See 'docker run --help'."

#1868 opened May 7, 2024 by anushka192001

4 tasks

Use pre-built FA2, vllm, quantization kernels in the dockerfiles

#1867 opened May 7, 2024 by fxmarty

Regarding llama3-70b-instruct

#1864 opened May 6, 2024 by chintanshrinath

Mistral7b takes 4 times its size in VRAM on A100

#1863 opened May 6, 2024 by martinigoyanes

Encounter install error when install vllm package.

#1862 opened May 6, 2024 by for-just-we

2 of 4 tasks

TGI-2.0.2 encounter "CUDA is not available"

#1861 opened May 6, 2024 by Cucunnber

2 of 4 tasks

Add Intel Arc iGPU support (Meteor Lake)

#1859 opened May 5, 2024 by sulliwane

Add grammar to chat/completions endpoint / Messages API

#1858 opened May 5, 2024 by ggbetz

Add stop_regex parameter to /generate

#1857 opened May 5, 2024 by rojas-diego

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly