sci-misc/llama-swap
Reliable LLM model swapping proxy for llama.cpp / vllm / etc.
ChangeLog
commit 2414ee242a98fcc2f74ef360a635751eafcec09b
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon Jun 1 11:59:47 2026 +0200
sci-misc/llama-swap: add 222
Two upstream commits: Unix process-group + Windows job-object handling
for clean child-process shutdown, plus an SSE goroutine race fix.
go.mod baseline 1.26.1 unchanged; golang.org/x/sys promoted from
indirect to direct (the new process-management code imports it).
commit c08854619280f171384e02f12c2b761741fcf076
Author: Raukaan Cogbrother <cogbrother@raukaan.local>
Date: Sun May 31 11:03:32 2026 +0200
sci-misc/llama-swap: add 220
Two upstream commits: load-testing TUI added (pulls bubbletea / lipgloss /
bubbles + transitive deps into go.sum) plus concurrency-middleware JSON
payload fix. go.mod baseline stays at 1.26.1.
commit 97b222aeced2669a0c84267c1de0b8a6574292e2
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed May 27 13:14:09 2026 +0200
sci-misc/llama-swap, sci-ml/fastflowlm: add openrc + systemd services
Both get supervise-daemon-driven openrc init.d/conf.d and a systemd
per-instance template (pkg@<user>.service). Runners are USE-gated
and default off; the existing manual-start flow stays the unchanged
baseline.
llama-swap openrc refuses to start until LLAMA_SWAP_USER is set;
auto-derives LLAMA_SWAP_CONFIG from that user's $HOME via getent if
unset. Listener defaults to 127.0.0.1:8080 so a fresh install doesn't
expose the LLM API to the LAN. systemd unit has no default
LLAMA_SWAP_CONFIG -- systemd's %h resolves to /root for system-
manager units and /home/%i would bake in a passwd layout we can't
promise -- so EnvironmentFile=/etc/default/llama-swap@%i is required.
fastflowlm openrc refuses to start until FLM_USER is set; rc_ulimit
'-l unlimited' (and LimitMEMLOCK=infinity in systemd) is needed
because flm mlocks NPU buffers. systemd ExecStart goes through
/bin/sh -c with $$ escapes so ${FLM_PORT:+--port "$FLM_PORT"}
parameter expansion runs in the shell -- systemd's variable parser
has no :+ semantics.
Hardening:
- llama-swap: ProtectSystem=full rather than =strict so backends
spawned by it (llama.cpp et al.) can still write to ~/.cache/.
- fastflowlm: deliberately omits ProtectKernelTunables (NPU power-
mode may touch /sys/) and MemoryDenyWriteExecute (XDNA path may
use JIT); revisit once empirically verified safe.
Service files live at files/<pkg>.service (no @) because pkgcheck
BannedCharacter rejects @ in files/* filenames; systemd_newunit's
target arg adds the @ at install.
commit 6f17f543c81c02adddb622daad1d612e41726efa
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon May 25 20:54:28 2026 +0200
sci-misc/llama-swap: new package, 217
LLM model-swapping HTTP proxy for llama.cpp / vllm / mlx-server / etc.
Single Go binary; routes OpenAI/Anthropic-compatible API requests to
the right backend and lifecycle-manages those backends on demand so a
single endpoint can serve many models without keeping them all
GPU-resident.
Source build via go-module eclass against a vendored bundle hosted
on extra-stuff (sci-misc/llama-swap/llama-swap-217.tar.xz, tag
llama-swap-217-r0-0). The bundle is the upstream v217 tag plus
`go mod vendor`, generated locally so the in-tree build is
network-sandbox-clean.
Upstream embeds a Svelte web UI via `//go:embed ui_dist`. That UI
needs npm+vite, which can't sanely be vendored alongside the Go
modules. USE=ui pulls in net-libs/nodejs and runs `npm ci && npm
run build` at compile time (with RESTRICT=network-sandbox lifted —
same shape sci-misc/llama-cpp uses for its own webui). Default
USE=-ui stubs proxy/ui_dist/ with a "rebuild with USE=ui" index.html
so the //go:embed directive is satisfied and the HTTP API still
functions standalone.
Build-verified both USE=-ui (16 MB binary) and USE=ui (20 MB binary,
adds ~4 MB embedded Svelte assets) against go 1.26.3 + nodejs 24 +
npm 11.
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon Jun 1 11:59:47 2026 +0200
sci-misc/llama-swap: add 222
Two upstream commits: Unix process-group + Windows job-object handling
for clean child-process shutdown, plus an SSE goroutine race fix.
go.mod baseline 1.26.1 unchanged; golang.org/x/sys promoted from
indirect to direct (the new process-management code imports it).
commit c08854619280f171384e02f12c2b761741fcf076
Author: Raukaan Cogbrother <cogbrother@raukaan.local>
Date: Sun May 31 11:03:32 2026 +0200
sci-misc/llama-swap: add 220
Two upstream commits: load-testing TUI added (pulls bubbletea / lipgloss /
bubbles + transitive deps into go.sum) plus concurrency-middleware JSON
payload fix. go.mod baseline stays at 1.26.1.
commit 97b222aeced2669a0c84267c1de0b8a6574292e2
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed May 27 13:14:09 2026 +0200
sci-misc/llama-swap, sci-ml/fastflowlm: add openrc + systemd services
Both get supervise-daemon-driven openrc init.d/conf.d and a systemd
per-instance template (pkg@<user>.service). Runners are USE-gated
and default off; the existing manual-start flow stays the unchanged
baseline.
llama-swap openrc refuses to start until LLAMA_SWAP_USER is set;
auto-derives LLAMA_SWAP_CONFIG from that user's $HOME via getent if
unset. Listener defaults to 127.0.0.1:8080 so a fresh install doesn't
expose the LLM API to the LAN. systemd unit has no default
LLAMA_SWAP_CONFIG -- systemd's %h resolves to /root for system-
manager units and /home/%i would bake in a passwd layout we can't
promise -- so EnvironmentFile=/etc/default/llama-swap@%i is required.
fastflowlm openrc refuses to start until FLM_USER is set; rc_ulimit
'-l unlimited' (and LimitMEMLOCK=infinity in systemd) is needed
because flm mlocks NPU buffers. systemd ExecStart goes through
/bin/sh -c with $$ escapes so ${FLM_PORT:+--port "$FLM_PORT"}
parameter expansion runs in the shell -- systemd's variable parser
has no :+ semantics.
Hardening:
- llama-swap: ProtectSystem=full rather than =strict so backends
spawned by it (llama.cpp et al.) can still write to ~/.cache/.
- fastflowlm: deliberately omits ProtectKernelTunables (NPU power-
mode may touch /sys/) and MemoryDenyWriteExecute (XDNA path may
use JIT); revisit once empirically verified safe.
Service files live at files/<pkg>.service (no @) because pkgcheck
BannedCharacter rejects @ in files/* filenames; systemd_newunit's
target arg adds the @ at install.
commit 6f17f543c81c02adddb622daad1d612e41726efa
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon May 25 20:54:28 2026 +0200
sci-misc/llama-swap: new package, 217
LLM model-swapping HTTP proxy for llama.cpp / vllm / mlx-server / etc.
Single Go binary; routes OpenAI/Anthropic-compatible API requests to
the right backend and lifecycle-manages those backends on demand so a
single endpoint can serve many models without keeping them all
GPU-resident.
Source build via go-module eclass against a vendored bundle hosted
on extra-stuff (sci-misc/llama-swap/llama-swap-217.tar.xz, tag
llama-swap-217-r0-0). The bundle is the upstream v217 tag plus
`go mod vendor`, generated locally so the in-tree build is
network-sandbox-clean.
Upstream embeds a Svelte web UI via `//go:embed ui_dist`. That UI
needs npm+vite, which can't sanely be vendored alongside the Go
modules. USE=ui pulls in net-libs/nodejs and runs `npm ci && npm
run build` at compile time (with RESTRICT=network-sandbox lifted —
same shape sci-misc/llama-cpp uses for its own webui). Default
USE=-ui stubs proxy/ui_dist/ with a "rebuild with USE=ui" index.html
so the //go:embed directive is satisfied and the HTTP API still
functions standalone.
Build-verified both USE=-ui (16 MB binary) and USE=ui (20 MB binary,
adds ~4 MB embedded Svelte assets) against go 1.26.3 + nodejs 24 +
npm 11.


View
Download
Browse