Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build a llama-cpp-python Container with CUDA ? #459

Open
rsoika opened this issue Apr 2, 2024 · 2 comments
Open

How to build a llama-cpp-python Container with CUDA ? #459

rsoika opened this issue Apr 2, 2024 · 2 comments

Comments

@rsoika
Copy link

rsoika commented Apr 2, 2024

Hi, I have just a question and hope that someone of you can help me out as I am now on a 3-day-installation-odyssey.

I have written a small python based Rest API to run the Mistra-7B model with llama-cpp-pyhton in a Docker Container. Everything works fine on my Linux Notebook without a GPU.

Now I ordered a Server (Intel Core i7-7700 + GeForce GTX 1080). The goal is of course to use the GPU. And so I installed the Nvidia Drivers on the host and tested with nvidia-sim that all is working.

The big question I haven't been able to find any answer for days is: how can I build a Docker image with llama-cpp-python that uses my host's GPU? The whole thing seems like a special rocket science and I'm deeply frustrated.

Unfortunately, also these dustynv/cuda-python images don't work either for me. The error message is:

The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested
exec /bin/bash: exec format error

Does anyone know of an easy-to-understand guide on how to do something like this? As I said, the host already has the Nvida drivers. I didn't expect it to be so complicated to teach my container to use the GPU.

Thanks for any kind of help.

@dusty-nv
Copy link
Owner

dusty-nv commented Apr 3, 2024

Hi @rsoika, yes as you have found all the container images from this repo are built for Jetson (ARM64+CUDA), however if you check my llama_cpp dockerfile you can see how I build it (you would just use NGC cuda base image for x86 instead)

https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/llama_cpp/Dockerfile

Note how I compile llama_cpp_python with -DLLAMA_CUBLAS=on -DLLAMA_CUDA_F16=1 flags in there

@rsoika
Copy link
Author

rsoika commented May 6, 2024

I am now using the nvida/cuda Docker image as the base image and install the llama-cpp part.
This works good:

This is how my Dockerfile looks like:

# See: https://github.com/abetlen/llama-cpp-python/blob/main/docker/cuda_simple/Dockerfile
ARG CUDA_IMAGE="12.1.1-devel-ubuntu22.04"
FROM nvidia/cuda:${CUDA_IMAGE}

# Install Python3
RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y build-essential python3 python3-pip gcc 

# setting build related env vars
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1

# Install llama-cpp-python (build with cuda)
RUN python3 -m pip install --upgrade pip pytest cmake fastapi uvicorn
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install --upgrade llama-cpp-python

# Install fastAPI and copy app
RUN pip install fastapi-xml
COPY ./app /app
WORKDIR /app

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants