dev-python/flashinfer-python
FlashInfer: kernel library for LLM serving (Python frontend)
-
flashinfer-python-0.6.12~amd64python_single_target_python3_12 python_single_target_python3_13 python_single_target_python3_14
View
Download
Browse License: Apache-2.0 Overlay: stuff -
flashinfer-python-0.6.11_p3~amd64python_single_target_python3_12 python_single_target_python3_13 python_single_target_python3_14
View
Download
Browse License: Apache-2.0 Overlay: stuff -
flashinfer-python-0.6.11_p2~amd64python_single_target_python3_12 python_single_target_python3_13 python_single_target_python3_14
View
Download
Browse License: Apache-2.0 Overlay: stuff
ChangeLog
commit 68de1b47a23f51fed8162033154ed94867742bd9
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sat May 30 14:21:06 2026 +0200
dev-python/flashinfer-python: bump 0.6.11_p3 -> 0.6.12
Pairs with the flashinfer-cubin bump. Upstream's requires_dist diff
shows only optional-extra changes (new "nvep" extra pulling
cuda-python; the "cuda-tile[tileiras]" extra is satisfied by our
existing cuda-tile-bin dep). No baseline RDEPEND changes.
commit f3c33810e93cb9e856ed026eb4f1a93f2b6cac3c
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 17 00:52:23 2026 +0200
dev-python/flashinfer-python: align 0.6.8_p1 with vllm cuda.txt
Upstream vllm 0.21.0's cuda.txt enforces:
- nvidia-cudnn-frontend>=1.13.0,<1.19.0 (breaking changes in 1.19)
- nvidia-cutlass-dsl==4.4.2 (exact pin)
Apply both to 0.6.8_p1 so vllm[cuda] resolves. Newer 0.6.11.x
versions left alone — they may have been retested against
cudnn-frontend >=1.19 / cutlass-dsl >=4.5; re-audit at consumer-pin
time before tightening those.
commit a53be869a67163ee151dee638983986667793284
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sat May 16 13:11:07 2026 +0200
dev-python/flashinfer-python: add 0.6.11_p3
PyPI sdist bump: .post2 -> .post3 (our _p2 -> _p3). Pulls flashinfer-cubin ~_p3 via
the version-coupled cond-dep.
commit 3e0f976fd200159cc40eac40cee48f72f5c8ac58
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu May 14 12:01:00 2026 +0200
dev-python/flashinfer-python: add 0.6.11_p2
commit 9b5ba1580af42fb72b97b24231dfae2f64d1a5b3
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed May 13 14:34:48 2026 +0200
dev-python/flashinfer-python: disable py3.11
commit b8143a6ca85d0e978675a1f683b1893cccc119af
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 10 15:26:08 2026 +0200
dev-python/flashinfer-python: switch to DISTUTILS_SINGLE_IMPL
sci-ml/pytorch is SINGLE_IMPL; multi-impl consumer with bare
$ produces python_targets_python3_*(-)? that the
single-impl child can't expose. The pytorch dep moves to bare
$ and the rest of the (multi-impl) deps move
into python_gen_cond_dep.
commit a1b5a8250d662e8d8600990d5dd9578bba113622
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 10 16:49:49 2026 +0200
dev-python/flashinfer-python: switch to DISTUTILS_SINGLE_IMPL
sci-ml/pytorch is SINGLE_IMPL; multi-impl consumer with bare
$ produces python_targets_python3_*(-)? that the
single-impl child can't expose. The pytorch dep moves to bare
$ and the rest of the (multi-impl) deps move
into python_gen_cond_dep.
(Hand-port of the same conversion landing on 0.6.8_p1 to our newer
0.6.11 ebuild.)
commit 97b49dbfa7221a4d42d6f9ac5264f2f5a97d5c5c
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 10 15:04:33 2026 +0200
dev-python/flashinfer-python: add 0.6.11
commit b4cf378def63a936ade846dce872c3faa9bd3bcf
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu May 7 18:38:08 2026 +0200
dev-python/flashinfer-python: new package, 0.6.8.post1
Tier 5 of the vllm CUDA target packaging cycle. FlashInfer's Python
frontend — the kernel library that vllm's CUDA target dispatches
attention / fused-MoE / GEMM workloads through.
Pure-Python install at packaging time: the C++ csrc/include and
vendored cutlass / spdlog ride along as data files for the runtime
JIT compilation pipeline. No nvcc invocation at install, so the
gcc-15 host-pin from tilelang isn't needed here. (At first JIT use,
end-users hit the same nvcc/gcc-15 compatibility window — that's
their build environment's responsibility.)
Two ebuild-side fixes:
* Same upstream py-modules-leak issue as torch-c-dlpack-ext, but with
TWO modules — pyproject.toml's [tool.setuptools] py-modules =
["build_backend", "build_utils"] would ship both PEP-517 backend
helpers at the top of site-packages. Drop both in
python_install_all.
* Upstream's requirements.txt lists nvidia-ml-py as the runtime dep,
but at first import flashinfer/utils.py does `import pynvml` —
the legacy module name. ::gentoo's dev-python/nvidia-ml-py
installs a `pynvml.py` shim at /usr/lib/pythonX.Y/site-packages/
pynvml.py, so the runtime dep is satisfied as long as nvidia-ml-py
is the actually-installed package. RDEPEND already names it; flagging
here so a future pip-equivalent lookup doesn't regress to a separate
`dev-python/pynvml` (which doesn't exist in ::gentoo).
PV translates _p1 ← .post1 (Gentoo PMS forbids .postN); the pypi
eclass handles the auto-derivation.
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sat May 30 14:21:06 2026 +0200
dev-python/flashinfer-python: bump 0.6.11_p3 -> 0.6.12
Pairs with the flashinfer-cubin bump. Upstream's requires_dist diff
shows only optional-extra changes (new "nvep" extra pulling
cuda-python; the "cuda-tile[tileiras]" extra is satisfied by our
existing cuda-tile-bin dep). No baseline RDEPEND changes.
commit f3c33810e93cb9e856ed026eb4f1a93f2b6cac3c
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 17 00:52:23 2026 +0200
dev-python/flashinfer-python: align 0.6.8_p1 with vllm cuda.txt
Upstream vllm 0.21.0's cuda.txt enforces:
- nvidia-cudnn-frontend>=1.13.0,<1.19.0 (breaking changes in 1.19)
- nvidia-cutlass-dsl==4.4.2 (exact pin)
Apply both to 0.6.8_p1 so vllm[cuda] resolves. Newer 0.6.11.x
versions left alone — they may have been retested against
cudnn-frontend >=1.19 / cutlass-dsl >=4.5; re-audit at consumer-pin
time before tightening those.
commit a53be869a67163ee151dee638983986667793284
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sat May 16 13:11:07 2026 +0200
dev-python/flashinfer-python: add 0.6.11_p3
PyPI sdist bump: .post2 -> .post3 (our _p2 -> _p3). Pulls flashinfer-cubin ~_p3 via
the version-coupled cond-dep.
commit 3e0f976fd200159cc40eac40cee48f72f5c8ac58
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu May 14 12:01:00 2026 +0200
dev-python/flashinfer-python: add 0.6.11_p2
commit 9b5ba1580af42fb72b97b24231dfae2f64d1a5b3
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed May 13 14:34:48 2026 +0200
dev-python/flashinfer-python: disable py3.11
commit b8143a6ca85d0e978675a1f683b1893cccc119af
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 10 15:26:08 2026 +0200
dev-python/flashinfer-python: switch to DISTUTILS_SINGLE_IMPL
sci-ml/pytorch is SINGLE_IMPL; multi-impl consumer with bare
$ produces python_targets_python3_*(-)? that the
single-impl child can't expose. The pytorch dep moves to bare
$ and the rest of the (multi-impl) deps move
into python_gen_cond_dep.
commit a1b5a8250d662e8d8600990d5dd9578bba113622
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 10 16:49:49 2026 +0200
dev-python/flashinfer-python: switch to DISTUTILS_SINGLE_IMPL
sci-ml/pytorch is SINGLE_IMPL; multi-impl consumer with bare
$ produces python_targets_python3_*(-)? that the
single-impl child can't expose. The pytorch dep moves to bare
$ and the rest of the (multi-impl) deps move
into python_gen_cond_dep.
(Hand-port of the same conversion landing on 0.6.8_p1 to our newer
0.6.11 ebuild.)
commit 97b49dbfa7221a4d42d6f9ac5264f2f5a97d5c5c
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 10 15:04:33 2026 +0200
dev-python/flashinfer-python: add 0.6.11
commit b4cf378def63a936ade846dce872c3faa9bd3bcf
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu May 7 18:38:08 2026 +0200
dev-python/flashinfer-python: new package, 0.6.8.post1
Tier 5 of the vllm CUDA target packaging cycle. FlashInfer's Python
frontend — the kernel library that vllm's CUDA target dispatches
attention / fused-MoE / GEMM workloads through.
Pure-Python install at packaging time: the C++ csrc/include and
vendored cutlass / spdlog ride along as data files for the runtime
JIT compilation pipeline. No nvcc invocation at install, so the
gcc-15 host-pin from tilelang isn't needed here. (At first JIT use,
end-users hit the same nvcc/gcc-15 compatibility window — that's
their build environment's responsibility.)
Two ebuild-side fixes:
* Same upstream py-modules-leak issue as torch-c-dlpack-ext, but with
TWO modules — pyproject.toml's [tool.setuptools] py-modules =
["build_backend", "build_utils"] would ship both PEP-517 backend
helpers at the top of site-packages. Drop both in
python_install_all.
* Upstream's requirements.txt lists nvidia-ml-py as the runtime dep,
but at first import flashinfer/utils.py does `import pynvml` —
the legacy module name. ::gentoo's dev-python/nvidia-ml-py
installs a `pynvml.py` shim at /usr/lib/pythonX.Y/site-packages/
pynvml.py, so the runtime dep is satisfied as long as nvidia-ml-py
is the actually-installed package. RDEPEND already names it; flagging
here so a future pip-equivalent lookup doesn't regress to a separate
`dev-python/pynvml` (which doesn't exist in ::gentoo).
PV translates _p1 ← .post1 (Gentoo PMS forbids .postN); the pypi
eclass handles the auto-derivation.

