gpo.zugaina.org

Search Portage & Overlays:

dev-python/tilelang

Tile-level programming language for high-performance ML kernels

Screenshots

  • tilelang-0.1.10
    ~amd64
    python_single_target_python3_12 python_single_target_python3_13 python_single_target_python3_14 debug

    View      Download      Browse     License: MIT   
    Overlay: stuff
  • tilelang-0.1.9
    ~amd64
    python_single_target_python3_12 python_single_target_python3_13 python_single_target_python3_14 debug

    View      Download      Browse     License: MIT   
    Overlay: stuff

ChangeLog

commit 00e82f29e8e5359d45fd0c4498e2f881668e06ce
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Mon May 25 14:00:21 2026 +0200

dev-python/tilelang: add 0.1.10

Patch bump from 0.1.9. Upstream tightened apache-tvm-ffi pin from
~=0.1.0,>=0.1.2 to ~=0.1.0,>=0.1.10; mirror that with a >=0.1.10
floor on the python_gen_cond_dep entry (we already ship 0.1.11 in
the overlay so resolution stays clean). Other pyproject.toml deps
unchanged. Build-verified with python3_13 single-impl + the
existing CMAKE_CUDA_HOST_COMPILER / TILELANG_USE_CUDA_STUBS workarounds;
all four dated rationale comments re-verified against 0.1.10.

commit 51472bd93965ef04435f1165226d7a499e1bc72f
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed May 13 14:42:59 2026 +0200

dev-python/tilelang: drop stale python3_11 conditional

python_single_target_python3_11 was removed from PYTHON_COMPAT in the
python3_11 sweep but the torch-c-dlpack-ext conditional block was not
cleaned up, causing an UnstatedIuse pkgcheck error.

commit 5458d487801025adc249579494ecf1bb7eb68c52
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Wed May 13 14:35:35 2026 +0200

dev-python/tilelang: disable py3.11

commit 738368c3910b766e9fd40e870d6d78a5ceba313f
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 10 16:13:17 2026 +0200

dev-python/tilelang: fix z3 dep — use $

sci-mathematics/z3 is python-single-r1 (single-impl). The earlier
DISTUTILS_SINGLE_IMPL conversion left z3 inside python_gen_cond_dep
with [python,$], which silently auto-satisfies via
[X(-)?] when z3 doesn't have python_targets_* in IUSE. Move it out of
the multi-impl wrap and use [python,$] so the
Python-target match between tilelang and z3 is actually enforced.

commit 6ad25de09e790db9d83139a3d5611d32b6c04769
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Sun May 10 15:28:12 2026 +0200

dev-python/tilelang: switch to DISTUTILS_SINGLE_IMPL

sci-ml/pytorch is SINGLE_IMPL, and the existing python_targets_python3_*?
guards on the now-single-impl dev-python/torch-c-dlpack-ext need to flip
to python_single_target_python3_*?. Multi-impl helpers (apache-tvm-ffi,
cloudpickle, ml-dtypes, etc.) move into python_gen_cond_dep.

commit 655ba501a783d05d1463c4978bc50efbe0a409e3
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu May 7 18:25:53 2026 +0200

dev-python/tilelang: new package, 0.1.9

Tier 4 of the vllm CUDA target packaging cycle. Tile-level programming
language for high-performance ML kernels — used by flashinfer-python's
fused-attention paths and (via apache-tvm-ffi) by quack-kernels.

Build is scikit-build-core + a vendored full TVM source tree (407 MiB
under 3rdparty/tvm), plus vendored CUTLASS C++ and Composable-Kernel
headers. ~677 ninja steps; on this 24-thread / 31 GiB host the full
build runs ~10 min once unblocked.

Three ebuild-side fixes were needed to land:

* z3 lookup. Upstream's bundled cmake/pypi-z3/FindZ3.cmake searches
ONLY inside the PyPI z3-solver wheel's site-packages layout
(NO_DEFAULT_PATH). On Gentoo the headers + libz3 live at standard
/usr/include + /usr/lib64 paths. Pre-setting Z3_INCLUDE_DIR and
Z3_LIBRARY via DISTUTILS_ARGS short-circuits find_path /
find_library and lets the imported z3::libz3 target build cleanly
against ::gentoo's z3.

* nvcc host compiler. CUDA 13.2's crt/host_config.h hard-#errors when
__GNUC__ > 15, but this host's active gcc is 16. Pinning nvcc's
host compiler to /usr/bin/x86_64-pc-linux-gnu-g++-15 (slot 2 from
gcc-config) via -DCMAKE_CUDA_HOST_COMPILER and CUDAHOSTCXX keeps
the system slot at 16 while the CUDA toolchain stays in band.
Likely affects every other CUDA-source consumer in this stack.

* CUDA stub linkage. TILELANG_USE_CUDA_STUBS defaults ON and links
libtvm.so against tilelang's own libcudart_stub / libnvrtc_stub
for "portable wheel" lazy resolution. But there is no driver-API
stub, so direct calls to cuDeviceGetName etc. become undefined-
symbol at import. Turning the option OFF makes the build NEEDED-
link libtvm.so directly against /opt/cuda's stubs/libcuda.so /
libcudart.so / libnvrtc.so — the SONAMEs resolve to the real
driver+runtime libs at runtime via /etc/ld.so.cache. We don't
ship portable wheels, so the loss-of-portability is moot.

Z3-solver pin (<4.15.5) is intentionally NOT honored — ::gentoo only
carries z3-4.16.0; the cap reads as conservative ("tested up to here")
and tilelang imports + initializes cleanly against 4.16. Will revisit
if a runtime test surfaces a real incompatibility.

Accepting the NonsolvableDepsInStable false positive from
sci-mathematics/z3 / sci-ml/pytorch keyword stacks (same shape as
xgrammar etc.).