gpo.zugaina.org

Search Portage & Overlays:

dev-python/tilelang

Tile-level programming language for high-performance ML kernels

Screenshots

  • tilelang-0.1.9
    ~amd64
    python_targets_python3_11 python_targets_python3_12 python_targets_python3_13 python_targets_python3_14 debug

    View      Download      Browse     License: MIT   
    Overlay: stuff

ChangeLog

commit 655ba501a783d05d1463c4978bc50efbe0a409e3
Author: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: Thu May 7 18:25:53 2026 +0200

dev-python/tilelang: new package, 0.1.9

Tier 4 of the vllm CUDA target packaging cycle. Tile-level programming
language for high-performance ML kernels — used by flashinfer-python's
fused-attention paths and (via apache-tvm-ffi) by quack-kernels.

Build is scikit-build-core + a vendored full TVM source tree (407 MiB
under 3rdparty/tvm), plus vendored CUTLASS C++ and Composable-Kernel
headers. ~677 ninja steps; on this 24-thread / 31 GiB host the full
build runs ~10 min once unblocked.

Three ebuild-side fixes were needed to land:

* z3 lookup. Upstream's bundled cmake/pypi-z3/FindZ3.cmake searches
ONLY inside the PyPI z3-solver wheel's site-packages layout
(NO_DEFAULT_PATH). On Gentoo the headers + libz3 live at standard
/usr/include + /usr/lib64 paths. Pre-setting Z3_INCLUDE_DIR and
Z3_LIBRARY via DISTUTILS_ARGS short-circuits find_path /
find_library and lets the imported z3::libz3 target build cleanly
against ::gentoo's z3.

* nvcc host compiler. CUDA 13.2's crt/host_config.h hard-#errors when
__GNUC__ > 15, but this host's active gcc is 16. Pinning nvcc's
host compiler to /usr/bin/x86_64-pc-linux-gnu-g++-15 (slot 2 from
gcc-config) via -DCMAKE_CUDA_HOST_COMPILER and CUDAHOSTCXX keeps
the system slot at 16 while the CUDA toolchain stays in band.
Likely affects every other CUDA-source consumer in this stack.

* CUDA stub linkage. TILELANG_USE_CUDA_STUBS defaults ON and links
libtvm.so against tilelang's own libcudart_stub / libnvrtc_stub
for "portable wheel" lazy resolution. But there is no driver-API
stub, so direct calls to cuDeviceGetName etc. become undefined-
symbol at import. Turning the option OFF makes the build NEEDED-
link libtvm.so directly against /opt/cuda's stubs/libcuda.so /
libcudart.so / libnvrtc.so — the SONAMEs resolve to the real
driver+runtime libs at runtime via /etc/ld.so.cache. We don't
ship portable wheels, so the loss-of-portability is moot.

Z3-solver pin (<4.15.5) is intentionally NOT honored — ::gentoo only
carries z3-4.16.0; the cap reads as conservative ("tested up to here")
and tilelang imports + initializes cleanly against 4.16. Will revisit
if a runtime test surfaces a real incompatibility.

Accepting the NonsolvableDepsInStable false positive from
sci-mathematics/z3 / sci-ml/pytorch keyword stacks (same shape as
xgrammar etc.).