Is onnxruntime-genai supported? #7182

jackylu0124 · 2024-05-04T15:12:17Z

Hey all, I have a quick question, is onnxruntime-genai (https://onnxruntime.ai/docs/genai/api/python.html) supported in Triton Inference Server's ONNX runtime backend? I couldn't find relevant sources in the documentation. Thanks!

nnshah1 · 2024-05-04T15:57:36Z

@jackylu0124 Support for onnxruntime-genai is currently work in progress - the python bindings should work within the python backend - but we haven't had a chance to test that ourselves yet.

That being said we are actively investigating support - can you share more about your use case / timeline needed for support?

jackylu0124 · 2024-05-04T16:53:55Z

@jackylu0124 Support for onnxruntime-genai is currently work in progress - the python bindings should work within the python backend - but we haven't had a chance to test that ourselves yet.

That being said we are actively investigating support - can you share more about your use case / timeline needed for support?

Hi @nnshah1 , thank you very much for your fast reply! By "the python bindings should work within the python backend", you meant that I can do things like import onnxruntime_genai and write the custom inference logic in the Python backend, as opposed to having Triton Inference Server automatically manage all my .onnx model files (that use onnxruntime-genai) in the model repository for me (which is a feature currently in development), is my understanding correct?

My use case is mainly for serving LLM models, where some of which are in the form of ONNX models that depend on onnxruntime_genai. I don't have a specific timeline, I am mainly interested in knowing whether this feature is on Triton Inferencer Server's development roadmap or not.

Also a follow-up question: regarding serving LLM, what would be the best backend for serving and achieving token streaming outside of using the TensorRT-LLM backend?

Thanks!

nnshah1 added the question Further information is requested label May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is onnxruntime-genai supported? #7182

Is onnxruntime-genai supported? #7182

jackylu0124 commented May 4, 2024

nnshah1 commented May 4, 2024

jackylu0124 commented May 4, 2024

Is onnxruntime-genai supported? #7182

Is onnxruntime-genai supported? #7182

Comments

jackylu0124 commented May 4, 2024

nnshah1 commented May 4, 2024

jackylu0124 commented May 4, 2024