triton-inference-server / server Public

Notifications
Fork 1.4k
Star 7.4k

Code
Issues 409
Pull requests 51
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

409 Open 3,075 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

repeated answer:When I use vllm with Qwen-7b-chat the generated text is x lnot end until the maength, with the repeated answer

#7215 opened May 14, 2024 by ChengShuting

Inference in Triton ensemble model is much slower than single model in Triton

#7214 opened May 14, 2024 by AWallyAllah

Multiple CPU instances results in decreasing of infer speed.

#7213 opened May 14, 2024 by Voveka98

Questions about serving PyTorch LLM in Python backend with token streaming using "Decoupled Mode" question

Further information is requested

#7210 opened May 12, 2024 by jackylu0124

How to enable nsys when starting a Triton server using Python API question

Further information is requested

#7209 opened May 11, 2024 by jerry605

Query Regarding Custom Metrics For Python Backend question

Further information is requested

#7204 opened May 10, 2024 by AniForU

Perf_analyzer reported metrics for decoupled model

#7203 opened May 10, 2024 by ZhanqiuHu

Triton Server OpenVINO backend not working with Tensorflow saved models bug

Something isn't working

#7200 opened May 9, 2024 by atobiszei

triton infer server docker image not working on Jetson Orin NX 16 GB JP 5.1.1

#7199 opened May 9, 2024 by allan-navarro

Metrics Port Not Opening with Triton Inference Server's In-Process Python API

#7197 opened May 8, 2024 by yucai

GRPC infer returns null in outputs contents

#7191 opened May 7, 2024 by aohorodnyk

Memory leak with multiple GPU and BLS.

#7190 opened May 7, 2024 by kbegiedza

Unexpected reshaping of output

#7189 opened May 7, 2024 by lemousehunter

How to specify the TensorRT version in Triton Server for inference? question

Further information is requested

#7188 opened May 7, 2024 by Gcstk

grpc request performance issue

#7187 opened May 7, 2024 by Yuyaying-winnie

Cannot use model-analyzer on ONNX classification model with dynamic input question

Further information is requested

#7184 opened May 6, 2024 by siretru

Dynamically Limit Endpoint Access enhancement

New feature or request

#7183 opened May 5, 2024 by amoosebitmymom

Is onnxruntime-genai supported? question

Further information is requested

#7182 opened May 4, 2024 by jackylu0124

ORT-TRT backend uses too much CPU memory

#7180 opened May 2, 2024 by ShuaiShao93

Unable to use triton client with shared memory in C++ (Jetpack 6 device) module: platforms

Issues related to platforms, hardware, and support matrix

#7177 opened May 1, 2024 by ganeshmojow

Manually update model repository index

#7173 opened Apr 30, 2024 by vilkkiE

Input data/shape validation bug

Something isn't working

question

Further information is requested

#7171 opened Apr 29, 2024 by HennerM

Python Backend: one model instance over multiple GPUs

#7169 opened Apr 29, 2024 by CollinHU

Perf Analyzer Error: Cannot send stop request without specifying a request_id

#7168 opened Apr 28, 2024 by MuyeMikeZhang

[Question] Is it possible to shutdown Triton if we detect certain cuda errors ?

#7164 opened Apr 26, 2024 by MatthieuToulemont

Previous 1 2 3 4 5 … 16 17 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-05-11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly