Metrics Port Not Opening with Triton Inference Server's In-Process Python API #7197

yucai · 2024-05-08T18:01:24Z

Description

We are encountering an issue with the Triton Inference Server's in-process Python API where the metrics port (default: 8002) does not open. This results in a 'connection refused' error when attempting to access localhost:8002/metrics. We would appreciate guidance on how to properly enable the metrics port using the in-process Python API.

Triton Version

2.42.0

Steps to reproduce the behavior

Initialize the Triton Inference Server using the in-process Python API with the following code snippet:

import tritonserver

# Initialize and start the Triton server
self._triton_server = tritonserver.Server(
    model_repository=model_repository,
    model_control_mode=tritonserver.ModelControlMode.EXPLICIT,
)
self._triton_server.start(wait_until_ready=True)

Attempt to access the metrics endpoint at localhost:8002/metrics.
Observe the 'connection refused' error.

Expected behavior

The metrics port should be accessible and provide metrics data when the Triton Inference Server is started using the in-process Python API.

Temporary Workaround

As a temporary solution, we have started an HTTP server manually to serve the metrics endpoint:

import tritonserver
import uvicorn
import threading
from fastapi import FastAPI
from starlette.responses import Response

# Initialize and start the Triton server
self._triton_server = tritonserver.Server(
    model_repository=['/mount/data/models'],
    model_control_mode=tritonserver.ModelControlMode.EXPLICIT
)
self._triton_server.start(wait_until_ready=True)
self._triton_server.load('clip')
self._model = self._triton_server.model('clip')

# Set up a FastAPI application to serve metrics
self.app = FastAPI()

@self.app.get("/metrics")
def get_metrics():
    output = self._triton_server.metrics()
    return Response(output, media_type="text/plain")

# Run the FastAPI app in a separate thread
def run():
    uvicorn.run(self.app, host="0.0.0.0", port=8002)

self.server = threading.Thread(target=run)
self.server.start()

We would prefer to use the built-in functionality for serving metrics and avoid maintaining this workaround. Any suggestions or solutions would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

yucai · 2024-05-08T18:11:56Z

@nnshah1 We are using this API in ray data, very similar to what you did for ray serve. Like below:
https://github.com/triton-inference-server/tutorials/blob/main/Triton_Inference_Server_Python_API/examples/rayserve/tritonserver_deployment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics Port Not Opening with Triton Inference Server's In-Process Python API #7197

Metrics Port Not Opening with Triton Inference Server's In-Process Python API #7197

yucai commented May 8, 2024 •

edited

yucai commented May 8, 2024

Metrics Port Not Opening with Triton Inference Server's In-Process Python API #7197

Metrics Port Not Opening with Triton Inference Server's In-Process Python API #7197

Comments

yucai commented May 8, 2024 • edited

yucai commented May 8, 2024

yucai commented May 8, 2024 •

edited