Wheel provenance

The vllm-node image installs vLLM + flashinfer from wheels built locally inside the Dockerfile. There is no upstream wheel URL. The wheels are recorded by git SHA inside the image:

/workspace/wheels/.vllm-commit         = ace95c9cf
/workspace/wheels/.flashinfer-commit   = 1d54c5c6

Pip records each wheel's install source as a local file:

file:///workspace/wheels/vllm-0.22.1rc1.dev124+gace95c9cf.d20260603.cu132-cp312-cp312-linux_aarch64.whl
file:///workspace/wheels/flashinfer_python-0.6.12+1d54c5c6-cp39-abi3-linux_aarch64.whl

To reproduce the exact image used in these blog posts, build the vllm-node image from the eugr/spark-vllm-docker repo with those refs passed as build args:

git clone https://github.com/eugr/spark-vllm-docker.git
cd spark-vllm-docker
./build-and-copy.sh \
    --vllm-ref ace95c9cf \
    --flashinfer-ref 1d54c5c6

(build-and-copy.sh is the repo's wrapper around docker buildx build that also distributes the resulting image tarball to all worker GX10s.)

What the Dockerfile does

Multi-stage build, base = nvidia/cuda:13.2.0-devel-ubuntu24.04:

  1. base — installs torch==2.11.0 from https://download.pytorch.org/whl/cu130, plus cuDNN 9 for CUDA 13. TORCH_CUDA_ARCH_LIST="12.1a" is set here (Blackwell sm_121a).

  2. NCCL — built from https://github.com/zyang-dev/nccl.git branch dgxspark-3node-ring, gencode arch=compute_121,code=sm_121. Installed as .deb.

  3. flashinfer-builder — clones https://github.com/flashinfer-ai/flashinfer.git, checks out ${FLASHINFER_REF} (= 1d54c5c6 for this image), builds flashinfer-python, flashinfer-cubin, flashinfer-jit-cache wheels into /workspace/wheels/, dumps .flashinfer-commit.

  4. vllm-builder — clones https://github.com/vllm-project/vllm.git, checks out ${VLLM_REF} (= ace95c9cf), strips flashinfer, triton, fastsafetensors from requirements (they come from the pre-built wheels), then uv build --wheel, dumps .vllm-commit.

  5. runner — fresh cuda:13.2.0-devel, bind-mounts the wheels directory, installs everything with uv pip install.

Known regression — torch metadata lies

vLLM wheels built off ace95c9cf ship metadata pinning torch==2.10.0, but the compiled ABI requires torch==2.11.0. If you ever pip install something that touches the torch dependency it will silently downgrade and break the runtime. Fix:

pip install --force-reinstall --no-deps torch==2.11.0 \
    --index-url https://download.pytorch.org/whl/cu130

The Dockerfile above installs torch==2.11.0 before the wheels, so a clean image is already correct — this only bites on rebuilds and post-hoc pip changes.