Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics Port Not Opening with Triton Inference Server's In-Process Python API #7197

Open
yucai opened this issue May 8, 2024 · 1 comment

Comments

@yucai
Copy link

yucai commented May 8, 2024

Description

We are encountering an issue with the Triton Inference Server's in-process Python API where the metrics port (default: 8002) does not open. This results in a 'connection refused' error when attempting to access localhost:8002/metrics. We would appreciate guidance on how to properly enable the metrics port using the in-process Python API.

Triton Version

2.42.0

Steps to reproduce the behavior

  1. Initialize the Triton Inference Server using the in-process Python API with the following code snippet:
import tritonserver

# Initialize and start the Triton server
self._triton_server = tritonserver.Server(
    model_repository=model_repository,
    model_control_mode=tritonserver.ModelControlMode.EXPLICIT,
)
self._triton_server.start(wait_until_ready=True)
  1. Attempt to access the metrics endpoint at localhost:8002/metrics.
  2. Observe the 'connection refused' error.

Expected behavior

The metrics port should be accessible and provide metrics data when the Triton Inference Server is started using the in-process Python API.

Temporary Workaround

As a temporary solution, we have started an HTTP server manually to serve the metrics endpoint:

import tritonserver
import uvicorn
import threading
from fastapi import FastAPI
from starlette.responses import Response

# Initialize and start the Triton server
self._triton_server = tritonserver.Server(
    model_repository=['/mount/data/models'],
    model_control_mode=tritonserver.ModelControlMode.EXPLICIT
)
self._triton_server.start(wait_until_ready=True)
self._triton_server.load('clip')
self._model = self._triton_server.model('clip')

# Set up a FastAPI application to serve metrics
self.app = FastAPI()

@self.app.get("/metrics")
def get_metrics():
    output = self._triton_server.metrics()
    return Response(output, media_type="text/plain")

# Run the FastAPI app in a separate thread
def run():
    uvicorn.run(self.app, host="0.0.0.0", port=8002)

self.server = threading.Thread(target=run)
self.server.start()

We would prefer to use the built-in functionality for serving metrics and avoid maintaining this workaround. Any suggestions or solutions would be greatly appreciated.

@yucai
Copy link
Author

yucai commented May 8, 2024

@nnshah1 We are using this API in ray data, very similar to what you did for ray serve. Like below:
https://github.com/triton-inference-server/tutorials/blob/main/Triton_Inference_Server_Python_API/examples/rayserve/tritonserver_deployment.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant