Background

docker stats exposes CPU and memory. nvidia-smi exposes total GPU utilization and VRAM by process PID. Neither tells you which container is consuming how much VRAM. Getting from a PID to a container requires walking from State.Pid through /proc/{pid}/cgroup and matching the cgroup scope against the NVML process list. cgroup v2 systemd format is docker-{container_id}.scope, revealing the mapping.

Motivation

When running multiple inference containers on a single GPU node, watch docker stats is not enough, and nvidia-smi does not know about containers. Correlating the two by hand is annoying at best, and frustrating when things are crashing. This friction should not exist.

What Boje Does

A terminal UI for docker ps. Fixed-column list of running containers with live CPU, memory, net I/O, and per-container VRAM. Select a container to expand inline cur/min/max statistics and a 30-sample sparkline per metric. Press l for a live log tail without leaving the view.

Implementation

Rust with ratatui for rendering and bollard for the Docker API. NVML is synchronous and runs on a dedicated thread rather than on the tokio runtime to avoid blocking the event loop.

The VRAM-per-container mapping is re-derived each tick from cgroup v2 scopes, which handles container restarts without special-casing.

Linux only; requires cgroup v2 and systemd.