Gentoo Portage Overlays - sci-misc/llama-swap

sci-misc/llama-swap

Reliable LLM model swapping proxy for llama.cpp / vllm / etc.

Screenshots

https://github.com/mostlygeek/llama-swap

llama-swap-229

~amd64 ~arm64

openrc systemd ui

View

Download

Browse License: MIT

Overlay: stuff

llama-swap-228

~amd64 ~arm64

openrc systemd ui

View

Download

Browse License: MIT

Overlay: stuff

ChangeLog USE Flags Dependencies Reverse Deps Related Bugs

ChangeLog

commit c94782ed1bf4b3d0001979c7dd097ccc49a3fb09
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon Jun 22 11:04:39 2026 +0200

sci-misc/llama-swap: drop 225, 226

commit 1e76fda35a3da52098f74e698d5869cdef46b576
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon Jun 22 11:03:51 2026 +0200

sci-misc/llama-swap: add 229

Tracks upstream tag v229 (activity-page column, failed-request
capture, upstream.ignorePaths). go.mod/go.sum are unchanged from
228, so the vendored module set is identical; the bundle adds only
the two new internal/config/upstream.go sources. Vendored bundle
hosted on extra-stuff under tag llama-swap-229-r0-0.

commit 890a8e28f326ffad8eb3c497554f55b917bc12f1
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sat Jun 20 15:49:48 2026 +0200

sci-misc/llama-swap: Keyword 225 for ~arm64

commit 8d65c006954848e63aa19a4c2023313b281d1162
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu Jun 18 10:36:01 2026 +0200

sci-misc/llama-swap: Keyword 228 for ~arm64

commit cba727e66f575dbc0aa4efdbb56b7a86ed6b56bd
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed Jun 17 17:19:05 2026 +0200

sci-misc/llama-swap: Keyword 226 for ~arm64

Go proxy/model-swapper for llama.cpp servers; the Go toolchain targets
aarch64 natively. Architecture-independent at the source level.

commit 01ba1c8cd4a838e0b83942d5c3535a9f3d702349
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu Jun 18 08:12:13 2026 +0200

sci-misc/llama-swap: add 228

commit d5c80e5086cb2c7fb99419cff14763cc434e90d4
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Tue Jun 16 14:59:40 2026 +0200

sci-misc/llama-swap: drop 220, 223, 224

commit 5762a5c6a505ea89155535c4c6faf971467f613d
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon Jun 15 15:47:40 2026 +0200

sci-misc/llama-swap: add 226

commit 9fca9a65a3b6af343e7ee794e94dbbb95b953d70
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun Jun 14 12:34:59 2026 +0200

sci-misc/llama-swap: add 225

commit 9da24ddb97f79fe7061e5281e07f02e774f919d9
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed Jun 10 19:49:54 2026 +0200

sci-misc/llama-swap: add 224

commit addcce6085de07582e1a16f95623567f4ffa4f71
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sat Jun 6 20:09:34 2026 +0200

sci-misc/llama-swap: drop 222

commit de38c4f79f8ae72de8e33f22b8f9c3a4ea4a8bed
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu Jun 4 10:53:35 2026 +0200

sci-misc/llama-swap: drop 217

commit bbf67b93a88eda51efb02c849646f1664530dbc7
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu Jun 4 10:53:25 2026 +0200

sci-misc/llama-swap: add 223

commit 2414ee242a98fcc2f74ef360a635751eafcec09b
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon Jun 1 11:59:47 2026 +0200

sci-misc/llama-swap: add 222

Two upstream commits: Unix process-group + Windows job-object handling
for clean child-process shutdown, plus an SSE goroutine race fix.
go.mod baseline 1.26.1 unchanged; golang.org/x/sys promoted from
indirect to direct (the new process-management code imports it).

commit c08854619280f171384e02f12c2b761741fcf076
Author: Raukaan Cogbrother <cogbrother@raukaan.local>
Date: Sun May 31 11:03:32 2026 +0200

sci-misc/llama-swap: add 220

Two upstream commits: load-testing TUI added (pulls bubbletea / lipgloss /
bubbles + transitive deps into go.sum) plus concurrency-middleware JSON
payload fix. go.mod baseline stays at 1.26.1.

commit 97b222aeced2669a0c84267c1de0b8a6574292e2
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed May 27 13:14:09 2026 +0200

sci-misc/llama-swap, sci-ml/fastflowlm: add openrc + systemd services

Both get supervise-daemon-driven openrc init.d/conf.d and a systemd
per-instance template (pkg@<user>.service). Runners are USE-gated
and default off; the existing manual-start flow stays the unchanged
baseline.

llama-swap openrc refuses to start until LLAMA_SWAP_USER is set;
auto-derives LLAMA_SWAP_CONFIG from that user's $HOME via getent if
unset. Listener defaults to 127.0.0.1:8080 so a fresh install doesn't
expose the LLM API to the LAN. systemd unit has no default
LLAMA_SWAP_CONFIG -- systemd's %h resolves to /root for system-
manager units and /home/%i would bake in a passwd layout we can't
promise -- so EnvironmentFile=/etc/default/llama-swap@%i is required.

fastflowlm openrc refuses to start until FLM_USER is set; rc_ulimit
'-l unlimited' (and LimitMEMLOCK=infinity in systemd) is needed
because flm mlocks NPU buffers. systemd ExecStart goes through
/bin/sh -c with $$ escapes so ${FLM_PORT:+--port "$FLM_PORT"}
parameter expansion runs in the shell -- systemd's variable parser
has no :+ semantics.

Hardening:
- llama-swap: ProtectSystem=full rather than =strict so backends
spawned by it (llama.cpp et al.) can still write to ~/.cache/.
- fastflowlm: deliberately omits ProtectKernelTunables (NPU power-
mode may touch /sys/) and MemoryDenyWriteExecute (XDNA path may
use JIT); revisit once empirically verified safe.

Service files live at files/<pkg>.service (no @) because pkgcheck
BannedCharacter rejects @ in files/* filenames; systemd_newunit's
target arg adds the @ at install.

commit 6f17f543c81c02adddb622daad1d612e41726efa
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon May 25 20:54:28 2026 +0200

sci-misc/llama-swap: new package, 217

LLM model-swapping HTTP proxy for llama.cpp / vllm / mlx-server / etc.
Single Go binary; routes OpenAI/Anthropic-compatible API requests to
the right backend and lifecycle-manages those backends on demand so a
single endpoint can serve many models without keeping them all
GPU-resident.

Source build via go-module eclass against a vendored bundle hosted
on extra-stuff (sci-misc/llama-swap/llama-swap-217.tar.xz, tag
llama-swap-217-r0-0). The bundle is the upstream v217 tag plus
`go mod vendor`, generated locally so the in-tree build is
network-sandbox-clean.

Upstream embeds a Svelte web UI via `//go:embed ui_dist`. That UI
needs npm+vite, which can't sanely be vendored alongside the Go
modules. USE=ui pulls in net-libs/nodejs and runs `npm ci && npm
run build` at compile time (with RESTRICT=network-sandbox lifted —
same shape sci-misc/llama-cpp uses for its own webui). Default
USE=-ui stubs proxy/ui_dist/ with a "rebuild with USE=ui" index.html
so the //go:embed directive is satisfied and the HTTP API still
functions standalone.

Build-verified both USE=-ui (16 MB binary) and USE=ui (20 MB binary,
adds ~4 MB embedded Svelte assets) against go 1.26.3 + nodejs 24 +
npm 11.